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Treatment of Cancer and Neurological Diseases 

The present invention relates to the isolation of a nucleic acid molecule and the 
protein encoded thereby; antibodies raised thereto and the use of these products as 
5 therapeutic and/or diagnostic agents particularly, but not exclusively, in gene therapy 
* and/or tissue repair such as, without limitation enhancing neuronal repair 
/regeneration and in the treatment of cancer. 

Background to the Invention 

10 

Oral cancer has significant morbidity and mortality rates. In England and Wales the 
5-year survival is around 50%. Globally, oral cancer is. one of most common cancers 
and in some parts of the world it is the most prevalent of all cancer types. For 
example, in India and Sri Lanka oral cancer accounts for up to 40% of all diagnosed 
1 5 cancers. In addition to geographic "hot spots", there seems to be a rising trend in the 
increased incidence of oral cancers in many developed nations. 

Recent advances in cancer management have failed to impact significantly on the 
outcome of oral cancer. Surgery and radiotherapy remain the principle forms of 
20 treatment with a limited role for chemotherapy. Treatment can be mutilating and is 
associated with high morbidity that significantly impacts on the quality of life. 
Speech, swallowing and taste can be markedly impaired after treatment. New 
treatment modalities are required for oral cancer therapy. 

25 Statement of the Invention 

We have identified a gene, from human chromosome 8p23 ? which is deleted in oral 
cancer. The gene was found to have distant similarity to the gene encoding the 
protein 6 tolloid"; and contains multiple Sushi and CUB domains. We believe that 
30 this gene may have utility in diagnosis and gene therapy applications for oral and 
other cancers. 
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Moreover, and surprisingly, the gene from human chromosome 8p23 may also be 
implicated in aspects of the developmental regulation of neurogenesis. We base this 
belief on our observations that the gene has similarity with tolloid, an important 
developmental gene, and the fact that it is located in the autosomal recessive 

5 microcephaly locus, MCPH1, critical region. Sequence variations in this gene can 
segregate with microcephaly in some families. It therefore may have utility in the 
diagnosis and therapy of microcephaly, as well as therapies directed to neuronal 
repair and regeneration, including those utilising stem cells/neural progenitor cells. 
Having identified this gene we believe that a further use is in the production of 

1 0 transgenic animals. These may have an increased predisposition to oral cancer and/or 
have decreased or potentially increased neocortex. Such animals would be useful not 
only as models of oral cancer for the evaluation of novel therapeutics but also to 
improve understanding of neurological developmental abnormalities. They would 
also serve as models to test novel therapeutics for neuronal regeneration. 

15 

According to a first aspect of the present invention there is provided an isolated 
nucleic acid selected from the group consisting of: 

(a) DNA having the nucleotide sequence given herein as any one of SEQ 
v IDNOS:l TO 8; 

20 (b) nucleic acids which hybridize to DNA of (a) above (e.g., under 

stringent conditions); 

(c) nucleic acids having between 75-95% homology with any one of the 
nucleotide sequences given herein as SEQ ID NOS.l to 8; and 

(d) nucleic acids which differ from the DNA of (a), (b) or (c) above due 
25 to the degeneracy of the genetic code. 

DNAs of the present invention include those coding for proteins homologous tofand 
having essentially the same biological properties as, the proteins disclosed herein, 
and particularly the DNA disclosed herein as any one of SEQ ID NOS:l to 8 and 
30 encoding the proteins given herein as SEQ ID NOS:9 to 16 This definition is 
intended to encompass natural allelic variations therein. Thus, isolated DNA or 
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cloned genes of the present invention can be of any species of origin, including 
mouse, rat, rabbit, cat, porcine, and human, but are preferably of-mammalian origin. 
Thus, DNAs which hybridize to DNA disclosed herein as any one of SEQ ID NOSrl 
to 8 (or fragments or derivatives thereof which serve as hybridization probes as 
5 discussed below) and which code on expression for a protein of the present invention 
(e.g., a protein according to any one of SEQ ID NOS: 9 to 16), i.e. the protein lack of 
which is associated with oral or other cancers and/or lack of neurogenesis, of the 
present invention are to be included in the definition. 

10 Conditions which will permit other DNAs which code on expression for a protein of 
the present invention to hybridize to the DNAs of SEQ ID NO:l to 8 disclosed herein 
can be determined in accordance with known techniques. For example, hybridization 
of such sequences may be carried out under conditions of reduced stringency, 
medium stringency or even stringent conditions (e.g., conditions represented by a 

15 wash stringency of 35-40% Formamide with 5x Denhardt's solution, 0.5% SDS and 
lx SSPE at 37°C; conditions represented by a wash stringency of 40-45% 
Formamide with 5x Denhardt's solution, 0.5% SDS, and lx SSPE at 42°C; and 
conditions represented by a wash stringency of 50% Formamide with 5x Denhardt's 
solution, 0.5% SDS and lx SSPE at 42°C, respectively) to DNAs of- SEQ ID NO:l to 

20 8 disclosed herein in a standard hybridization assay. See, e.g., J. Sambrook et al., 
Molecular Cloning, A Laboratory Manual (2d Ed. 1989) (Cold Spring Harbor 
Laboratory). In general, sequences which code for proteins of the present invention 
and which hybridize to the 'DNAs of SEQ ID NO:l to 8 disclosed herein will be at 
least preferably 75% homologous, 85% homologous, and even 95% homologous or 

25 more with SEQ ID NO:l to 8. Further, DNAs which code for proteins of the present 
invention, or DNAs which hybridize to that given as any one of SEQ ID NOS:l to 8, 
but which differ in codon sequence from SEQ ID NO:l to 8 due to the degeneracy of 
the genetic code, are also an aspect of this invention. The degeneracy of the genetic 
code, which allows different, nucleic acid sequences to code for the same protein or 

30 peptide, is well known in the literature. See, e.g., U.S. Patent No. 4,757,006 to Toole 
etal. at Col. 2, Table 1. 
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According to a yet further aspect of the invention there is provided a nucleic acid 
molecule which encodes a protein lack of which is associated with oral or other 
cancers and/or lack of neurogenesis and comprises a nucleotide sequence which 
hybridises to the nucleic acid of any one of SEQ ID NOS:l to 8 under high 
5 stringency conditions. 

Preferably, hybridisation occurs under stringent conditions such as 1 x SSC, 0.1% 
SDS at 65 °C. 

10 Preferably, the nucleic acid is mammalian in origin, for example it may be human or 
murine. 

Preferably, the nucleic acid of the present invention is at least 2kb and up to 12 kb 
and may be, for example 5.5kb. The nucleic acid being located on chromosome 
15 8p23. 

According to a yet further aspect of the invention there is provided Use of the nucleic 
acid of the present invention, in determining loss of genomic material or loss of 
expression of mRNA in selected target tissue(s) for diagnosing oral or other cancers 
20 and/or neurological developmental abnormalities. 

According to a yet further aspect of the invention there is provided use of the nucleic 
acids of the present invention, in determining the presence of mutants in the DNA 
and thus diagnosing patients suffering from oral or other cancers and/or neurological 
25 developmental abnormalities. 

i 

According to a further aspect of the invention there is provided a polypeptide, or a 
protein comprising an epitope for an antibody or a protein modified by one or more 
amino acid modifications and comprising an epitope, or a fragment modified or 
30 unmodified comprising an eptitope for a protein lack of which is associated with oral 
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or other cancers and/or neurogenesis and encoded by SEQ ID NO:9 to 16. Ideally the 
: pQlypeptide is encoded by the nucleic acid molecule of any one of SEQ ID NO:l to 8. 

According to a yet further aspect of the invention there is provided a polypeptide or 
*5.. prptein encoded by the nucleic acids of the present invention, preferably the 
Z sequences of which are as set forth in SEQlD NOS:9 to 16. 

According to a yet further aspect of the invention there is provided a delivery vehicle 
comprising the isolated nucleic acid molecule or polypeptide or protein of the present 
10 invention or antibodies to these. 

Reference herein to the term delivery vehicle is intended to include any vector 
whether a viral vector or otherwise for example, without limitation, an adenovirus, a 
retrovirus, a herpesvirus, a plasmid, a phage, a phagemid or a liposome. 

15 

Ideally said delivery vehicle is adapted for administration, for example, but without 
limitation, by suitable formulation into a suspension. 

More preferably, said delivery vehicle is adapted to deliver said nucleic acid 
20 molecule or polypeptide to selected tissue. Thus the delivery vehicle is provided 
with means to facilitate its binding and/or penetration to a specific target site. The 
nature of the means comprises conventional technologies well known to those skilled 
in the art for example, without limitation, in the instance where the delivery vehicle 
is a viral vector said viral vector is provided with surface protein adapted to ensure 
25 the viral vector binds to and/or penetrates specific target tissues. Alternatively, gene 
expression of any one of SEQ ID NOS. l to 8 may be under the control of a tissue 
specific promoter. Thus, in this way, the nucleic acid molecule or peptide, fragments 
or derivatives thereof of the invention can be used in gene therapy treatments. 
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According to a yet further aspect of the invention there is provided antibodies raised 
— against the polypeptide, fragment or derivative thereof, of the invention. Ideally the 

antibodies are monoclonal and more ideally genetically engineered to be humanised. 

It will be apparent to those skilled in the art that the antibodies of the invention can 
5 be used to determine the expression of the polypeptide of the invention in selected 
" target tissue and thus aid in the diagnosis of patients suffering from oral cancers 

and/or neurological disorders. 

According to a yet further aspect of the invention there is provided use of antibodies, 
10 fragments or derivatives thereof in diagnosis/detection/identification of oral or other 
cancers and/or neurological disorders. It will be appreciated that the antibodies as 
well as the fragments or derivatives of the antibodies recognise the epitope and are 
capable of binding to the antigenic protein. Also useful are recombinant antibodies. 
The invention also includes antibodies and other compositions of matter which are 
15 specific binding partners of the polyamino acids of the present invention. Reference 
herein to polyamino acids is intended to include proteins and polypeptides. 

The invention further provides for assays using the antibodies of the present 
invention to detect individuals suffering from or having a predisposition towards oral 
20 or other cancers and/or neurologiacl disorders. The assays may employ labelling, for 
example radioactive labels, enzymes, fluorescent compounds, chemiluminescent 
compounds, bioluminescent compounds and metal chelates. 

Typical assays include assays known to the skilled person for quantitative or non- 
25 quantitative detection of antibodies and all involve contacting antigenic polypeptides 
of the present invention with a sample. The assay may involve for example and 
without limitation any one or more of the following techniques, RIA, EIA, ELISA, 
sandwich assays. 

30 According to a yet further aspect of the invention there is provided a method for the 
treatment of oral cancers and/or neurological disorders comprising administering to a 
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patient suffering from these conditions the nucleic acid molecule or 
polypeptide/protein of the present invention. 



Preferably, the nucleic add molecule and/or polypeptide/protein is administered by 
5 the incorporation of said nucleic acid molecule or polypeptide/protein into a delivery 
vehicle as herein described and ideally the method of treatment involves the use of 
gene therapy. 

According to a yet further aspect of the invention there is the nucleic acid and/or 
1 0 protein, as herein before described for use as a pharmaceutical. 

According to a yet further aspect of the invention there is provided use of the nucleic 
acid and/or protein of the present invention for the manufacture of a medicament for 
the treatment of oral or other cancers and/or neurological disorders. 

15 

According to a yet further aspect of the invention there is provided a method of 
producing a transgenic non-human animal comprising disrupting a gene, or the 
effective part thereof, the gene comprising the nucleic acid of the present invention 
and/or the protein or effective part thereof of the present invention. 

20 

Reference herein to disruption is intended to include complete or partial disruption of 
expression of the protein such that the transgenic animal is unable to express levels 
of the said protein that are typically found in normal individuals as compared with 
those suffering from oral cancer and/or neurological developmental abnormalities. 

25 

Preferably, the transgenic mammal is a rodent and ideally a mouse and more 
preferably the gene encoding the protein lack of which is associated with oral cancer 
and/or neurogenesis is the nucleic acid molecule or fragment or derivative thereof as 
set forth in any one of SEQ ID NOS:l to 8. 

30 
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According to a yet further aspect of the invention there is provided a transgenic non- 
hfcman animal whose somatic and germ cells do not contain or express a gene 
encoding a nucleic acid, or a nucleic acid which hybridises under high stringency 
conditions to, the sequence as set forth in any one of SEQ ID NOS:l r to 8, the gene 
having been deleted, mutated or disrupted in the animal or an ancestor of the animal 
at an embryonic stage and wherein the gene may be operably linked to an inducible 
promoter element. 

Preferably, the transgenic mammal is a rodent and ideally a mouse. 

According to a yet further aspect of the invention there is provided a reporter gene 
construct based on the promoter region of the gene, or effective part thereof, encoded 
by any one of SEQ ID NOS:l to 8 i.e. the nucleic acid of the present invention. 

1 5 According to a yet further aspect of the invention there is provided use of a reporter 
gene construct based on the promoter region of a gene, or effective part thereof, 
encoded by any one of SEQ ID NOS:l to 8 in the detection/screening of 
pharmaceuticals and/or other compounds. 

20 According to a yet farther aspect of the invention there is provided a method of 
determining the presence of or predisposition towards oral or other cancers and/or 
neurological developmental abnormalities comprising: 

(i) identifying the regions of said DNA sample that contain the nucleic 
acid according to the present invention; 
25 (ii) individually hybridising parallel samples of said DNAs with 

oligonucleotides specific for alleles of the gene encoding any one of 
said nucleic acids; and 
(iii) identifying from among said DNA samples those with a loss of 
heterozygosity for said alleles, wherein identification of a DNA 
30 sample with a loss of heterozygosity indicates presence or a 

predisposition towards neurological developmental abnormalities. 
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Preferably, the DNA sample is obtained from a human patient, alternatively RNA 
samples may be obtained and used in the method. 

Preferably, step (i) may involve amplification of the DNA regions, typically 
5 amplification is by PCR. 

Brief Description of the Figures 

The invention will now be described by way of example only with reference to the 
10 following Figures wherein; 

Figure 1 represents haplotypes for nine markers from 8p22-pter, for families 1 and 2 
segregating autosomal recessive microcephaly. Unaffected siblings from family 1 
have been omitted, for clarity. Marker order and relative distances are presented here 
15 as deduced from the Gen&hon map: D8S504-3cM-D8S1824-3cM-D8S1798-3cM- 
D8S277-2cM-D8S1819-5cM-D8S1825-13cM-D8S552-5cM-D8S1731-5cM- 
D8S261. 

Figure 2 represents sequenced BACs in this region from the human genome project. 
20 Position of candidate gene sequences 5R-3V2 (SEQ ED NO:5) and 5G-3V2 (SEQ ID 
NO:3) shown in blue (numbering corresponding to base-pair position in sequence). 
Sequenced BACs shown in red.BAC clone contig of [Sun, 1999 #387] shown in 
black, and STSs derived from this contig shown mapped onto the sequenced BACs 
by the vertical dashed black lines 

25 

Figure 3 represents the relationship between SEQ ID NO:l and the sequence variants 
of SEQ ID NOS:2 to 8 (not to scale). 

SEQ ID NO:l to 8 represent the nucleic acids of the present invention . 

30 - 

SEQ ED NOS: 9 to 16 represent the corresponding protein sequences. 
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Materials and Methods 
Subjects and Methods 

5 A family containing five individuals affected with primary autosomal recessive 
microcephaly was ascertained. The family originated from the Mirpur region of 
Pakistan (Fig. 1, family 1). According to the clinical histories, the family confirmed 
that microcephaly was present from birth in all affected individuals and that there 
was no history of epilepsy in affected individuals. On examination, head 

10 circumferences were 5-9 SD below the population age-relateci mean. The affected 
individuals examined were 13-28 years old, and mental retardation ranged from mild 
to moderate in severity. None were able to read or write, but all could speak and had 
basic self-care skills. Except for microcephaly, there were no dysmorphic features. 
No affected individual had a sloping forehead, such as that described by Penrose 

15 (Cowie 1960), examination did not reveal weakness, spasticity or athertosis. 
Computed tomography had been performed on one affected individual at 5 years of 
age and results were normal. No environmental causes of microcephaly were 
identified. All parents appeared to be of normal intelligence and had normal head 
circumferences. 

20 

A further eight multiply affected consanguineous families were ascertained, with a 
total of 23 affected individuals displaying primary microcephaly. All of these 
families also originated from the Mirpur region of Pakistan and had pedigrees 
consistent with autosomal recessive inheritance. 

25 

DNA Extraction and Microsatellite Analysis 

DNA was extracted from peripheral blood lymphocytes by means of a standard 
nonorganic extraction procedure. The ABI Prism linkage mapping primer set was 
30 used to perform a genomewide search. This panel contains 358 microsatellite repeat 
markers spaced at ~10-cM intervals, with an average heterozygosity of 0.81. PCR 
amplification of all the autosomal markers was performed according to the 

10 
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manufacturer's specifications. Amplified markers were pooled and electrophoresed 
: - on the ABI Prism 377 gene sequencer with a 4.2% polyacrylamide gel at 3000 V and 
52°C for 2 h. Fragment-length analysis was performed using the ABI Prism 
Genescan and Genotyper .1.1.1 analysis packages. 

5 

For fine mapping on 8p22-pter, D8S504 and D8S277 from the ABI Prism linkage set 
were used, and a further seven polymoiphic markers, from the Genome Database*, 
were selected: tel-D8Sl 824-D8S1798-D8S1 81 9-D8S1 825-D8S552-D8S173 1- 
D8S261-cen. PCR reactions were performed in 10-(al volumes that contained 50 ng 
10 genomic DNA; lpM primers; 250joM each dGTP, dCTP, dTTP, and dATP; 5 U 
Tag DNA polymerase; and 1 x reaction buffer (1.5-2.0 mM MgCl 2 , lOmM Tris-HCl 
pH 9.0, 50mM KC1, and 0.1% Triton X-100). Amplification was performed with a 
5-min initial denaturing step at 95°C; 35 cycles of 94°C for 30 s, 54°C-60°C for 30 s, 
and 72°C for 30 s; and a final incubation step at 72°C for 5 min. 

15 

Linkage Analysis 

A fully penetrant autosomal recessive mode of inheritance was assumed, and the 
disease allele frequency was estimated at 1/300. Two-point analysis was performed 

20 by the LINKAGE analysis programs (Terwilliger and Ott 1994) and HOMOZ- 
MAPMAKER was used for multipoint anlaysis (Kruglyak et al. 1995). An allele 
frequency of 0. 1 was used in the genome screen for all markers. For further analysis 
of the candidate region, marker allele frequencies were calculated by genotyping 34 
unrelated individuals from the same ethnic population, with a lower limit for allele 

25 frequencies set at 0.1. Heterogeneity testing was performed with the HOMOG 
program (Morton 1955; Terwilliger and Ott 1994). 

True Microcephaly was thus mapped to chromosome 8p23 (the MCPH1 locus) 
(Jackson, 1998) using homozygosity mapping to perform a genomewide search. 
30 Refinement of the locus was achieved using further fluorescently labelled primers to 
microsatellite markers in the region. The overlap between the homozygous regions 
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from family 1 and 2 (Figure 1) defined the minimal critical region within which the 
disease gene lies, between D8S1825 and D8S1824. SEQ ID NO 1 maps to this 
interval on the basis of radiation hybrid mapping data (Genemap 98, Figure 4). This 
is additionally confirmed from genomic sequence data (SEQ ID NOS: 1 and 9) 
-5 derived for the gene, which maps the gene to fully sequenced BACs (Figure 2). , 
These BACs map to the critical region by virtue of containing polymorphic markers 
mapping within the critical region. 

Genetic Analysis of Oral Cancers 

10 

Samples of oral cancers were obtained with local Ethics Committee approval from 
patients undergoing resections of their tumours. DNA was extracted from 20 such 
tumours and from the corresponding matched normal tissues, by standard techniques 
well-known in the art, providing 20 pairs of matched normal and oral cancer DNA 

15 specimens. Analysis of these paired specimens for loss of particular genetic loci in 
the tumours, suggestive of the local presence of a tumour suppressor gene, was 
performed by use of the polymerase chain reaction. Analysis of known micro- 
satellite markers including D8S1806, D8S1824, D8S1781, D8S1788 and D8S262 
(see Figure 2) among others, showed frequent loss of one or both alleles at these loci 

20 in the majority of the oral tumours. Loss of heterozygosity was particularly frequent 
at the genetic markers D8S1 824, D8S 1781 andD8S1788. 

The same matched tumour and normal tissue pairs were then compared for alterations 
in the gene encoding SEQ ID NO:l. In several of these tumours, deletion of both 

25 copies of this gene i.e. loss of both alleles, was detected in tumour DNA while PCR 
products of thie expected size were amplified using DNA from matched normal 
control tissue. In all other cases, the relative amount of PCR amplification product 
generated using a variety of PCR primer pairs selected within SEQ ID NOS:l to 8, 
was markedly reduced in the tumour DNA compared with that generated from 

30 normal DNA. In cases where one copy of the gene encoding the SEQ ID NO:l was 
apparently retained in tumour tissue, mutations were detected in the remaining DNA 
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such that the open reading frame encoding the protein of SEQ ID NOS:9 to 16 was 
disrupted. In every case studied, the change in SEQ ED NOS:l to 8 resulted in the 
alteration of a codon encoding a normal amino acid to a mis-sense amino acid or 
termination codon. Thus in these cases, the oral cancer cells were unable to 
.5 synthesise the protein of SEQ ID NOS:9 to 16; as a result either of deletion of both 
copies of the gene described in SEQ ID NOS:lto 8 or as a result of deletion of one 
copy and truncating or mis-sense mutation in 'the residual second copy of the gene. 
This consistent loss of gene expression in tumours is entirely consistent with a role 
for the protein in SEQ ED NOS:9 to 16 as a tumour suppressor protein. It also 

10 supports the hypothesis that replacement of a functional gene by provision of the 
nucleic acid sequence described in SEQ ID NOS:l to 8 would have therapeutic utility 
in the treatment of oral and other cancers demonstrating a similar pattern of loss of 
heterozygosity. Such patterns have been observed in the past for a number of other 
human malignancies including prostate cancer, breast cancer, ovarian cancer and 

15 colorectal cancer. Thus the nucleic acid of SEQ ID NOS:l to 8 and/or the protein of 
SEQ ID NOS:9 to 16 may find equal utility in the treatment of these other common 
human cancers. 

Accordingly the nucleic acid molecules and proteins encoded thereby of the present 
20 invention and products thereof, are of particular use in gene therapy and in 
identifying those suffering from or with a predisposition towards cancers, particularly 
oral cancers and neurological diseases. 



25 
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Claims 

1' An isolated nucleic acid, the nucleic acid being selected from the group 
consisting of: 

5 (a) DNAs having the nucleotide sequence given herein as any one of SEQ 

Z ' IDNOS:lto8; 

(b) nucleic acids which hybridise to DNAs of (a) above under stringent 
conditions; 

(c) nucleic acids having between 75-95% homology with any one of the 
10 nucleotide sequences given herein as SEQ ID NOS:l to 8; and 

(d) nucleic acids which differ from the DNA of (a), (b) or (c) above due 
to the degeneracy of the genetic. 

2. Nucleic acids according to claim 1 wherein the stringent conditions are 1 x 
15 SSC,0.1%SDSat65°C. 

3. Nucleic acids according to claim 1 consisting essentially of any one of SEQ 
IDNOS:lto8. 

20 4. Nucleic acids according to claim 1 which hybridise to any one of SEQ ED 
NOS:lto8. 

5. Nucleic acids according to claim 1 having between 75-95% homology with 
any one of the nucleotide sequences given herein.as SEQ ID NOS:l to 8. 

25 

6. Nucleic acids according to claim 1 which differ from the DNAs of any one of 
claims 3 to 5. 

7. Use of a nucleic acid according to any preceding claim in determining loss of 
.30 genomic material or loss of expression of mKNA in sample. 
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8. Use according to ciaim 7 in detecting the presence of or predisposition 
towards oral or other cancers and/or neurological developmental abnormalities. 

9. Use of a nucleic acid according to any one of claims 1 to 6 in determining the 
, 5 presence of mutants in DNA. 

10. Use according to claim 9 in identification of patients suffering from oral or 
other cancers and/or neurological developmental abnormalities. 

10 11. A polypeptide or a protein encoded by the nucleic acid molecules of any one 
of claims 1 to 6. 

12. A delivery vehicle comprising any one of the isolated nucleic acid molecules 
of claims 1 to 6 or the polypeptides or proteins encoded thereby or antibodies to these 

1 5 polypeptides or proteins. 

13. A delivery vehicle according to claim 12 comprising a viral vector selected 
from the group comprising an adenovirus, a retrovirus, a herpesvirus, a plasmid, a 
phage, a phagemid or a liposome 

20 

14. A delivery vehicle according to either claim 12 or 13 provided with surface 
protein adapted to facilitate binding and/or penetration to a specific target. 

15. A pharmaceutical composition comprising a nucleic acid according to any 
25 one of claims 1 to 6, a polypeptide or protein according to claim 1 1 and/or the 

delivery vehicle of any one of claims 12 to 14 and a suitable excipient, diluent or 
carrier. 

16. Antibodies which are specific binding partners of the polypeptide/protein of 
30 claim 1 1 or fragment or derivative thereof which are capable of binding to the 

antigenic part of the polypeptide/protein. 
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17. Antibodies according to claim 16 which are monoclonal and/or genetically 
: — ■ engineered to be humanised. 

18^ Use of antibodies or antibody fragments according to either claim 16 or 17 in 
5 determining the presence or level of expression of the polypeptide or protein of claim 
- 11. 

19. Use of antibodies or antibody fragments according to either claim 16 or 17 or 

>. 

fragments or derivatives thereof in detecting the presence or absence of binding 
10 partners whose absence is indicative of oral or other cancers and/or neurological 
disorders. 

20. A method for the treatment of oral cancers and/or neurological disorders 
comprising administering to a patient suffering from or predisposed to these 

15 conditions the nucleic acid molecule of any one of SEQ ID NOS:l to 8 and/or the 
proteins encoded thereby. 

21 . A nucleic acid according to any one of claims 1 to 6 or polypeptide or protein 
of claim 11 or delivery vehicle of any one of claims 12 to 14 for use as a 

20 pharmaceutical. 

22. A polyamino acid as set forth in any one of SEQ ID NOS: 9-16 for use as a 
pharmaceutical. 

25 23. Use of the nucleic acids according to any one of claims 1 to 6, for the 
manufacture of a medicament for the treatment of oral or other cancers and/or 
neurological disorders. 

24. A method of producing a transgenic non-human animal comprising disrupting 
30 a gene comprising the nucleic acid of any one of claims 1 to 6, or the effective part 
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thereof, the gene encoding a protein or effective part thereof lack of which is 
. : - associated with oral or other cancers and/or lack of neurogenesis. 

25. A method of producing a transgenic non-human animal comprising 
5 preventing expression of a protein or polypeptide of claim 11, or the effective part 

thereof, lack of expression of the protein being associated with oral or other cancers 
and/or lack of neurogenesis. 

26. A transgenic non-human animal whose somatic and germ cells do not contain 
10 or express a gene encoding a nucleic acid according to any one of claims 1 to 6, the 

gene having been deleted, mutated or disrupted in the animal or an ancestor of the 
animal at an embryonic stage and wherein the gene may be operably linked to an 
inducible promoter element. 

15 27. A transgenic non-human animal according to any one of claims 24 to 26 
wherein the animal is a rodent. 

28. A reporter gene construct based on the promoter region of the gene, or 
effective part thereof, comprising the nucleic acid of any one of claims 1 to 6. 

20 

29. Use of a reporter gene construct based on the promoter region of a gene, or 
effective part thereof, comprising the nucleic acid of any one of claims 1 to 6 in the 
detection/screening of pharmaceuticals and/or other compounds. 

25 30. A method of determining the presence of or predisposition towards oral 
cancer comprising: 

(i) identifying regions of a DNA sample that contain the nucleic acid 
according to any one of claims 1 to 6; 

(ii) individually hybridising parallel samples of said DNAs with 

30 oligonucleotides specific for alleles of the gene encoding any one of 

said nucleic acids; and 
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(iii) identifying from among said DNA samples those with a loss of 
heterozygosity for said alleles, wherein identification of a DNA 
sample with a loss of heterozygosity indicates presence or a 
predisposition towards oral cancer. 

5 

31. A modified method according to claim 30 wherein the sample comprises 
RNA. 

32. A method of determining the presence of or predisposition towards 
10 neurological developmental abnormalities comprising: 

(i) identifying regions of a DNA sample that contain the nucleic acid 
according to any one of claims 1 to 6; 

(ii) individually hybridising parallel samples of said DNAs with 
oligonucleotides specific for alleles of the gene encoding any one of said 

15 nucleic acids; and 

(iii) identifying from among said DNA samples those with a loss of 
heterozygosity for said alleles, wherein identification of a DNA sample with a 
loss of heterozygosity indicates presence or a predisposition towards 
neurological developmental abnormalities. 

20 

33. A modified method according to claim 32 wherein the sample comprises 
RNA. 

34. A kit comprising the nucleic acids of any one of claims 1 to 6 and a set of 
25 instructions for use thereof. 
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SEQ IDNO:l 



5 

cDNA sequence (partial) 5.5kb 

ttttagggatggtatgaatttaatattttttagtattacaatatattcttataaaaaaggtccaagtg 
aaaaaggcgattgagttgaagtcaagaggagtcaagatgctgcccagcaaggATGGAAGCCATAAAAA 

10 CTCTGTCTGGCATATGGAATAACATCAACCATGTGACATCCGAAGAAGATACGTTCATTATGTATCTG 
GGAAAACCATGGCTTCAAGTGAAAATTCAAGTGAGCCAAGGAGGTGTTGCATTGGTCTCTGACATGTG 
TCCAGATCCTGGGATTCCAGAAAATGGTAGAAGAGCAGGTTCCGACTTCAGGGTTGGTGCAAATGTAC 
AGTTTTCATGTGAGGACAATTACGTGCTCCAGGGATCTAAAAGCATCACCTGTCAGAGAGTTACAGAG 
ACGCTCGCTGCTTGGAGTGACCACAGGCCCATCTGCCGAGCGAGAACATGTGGATCCAATCTGCGTGG 

15 GCCCAGCGGCGTCATTACCTCCCCTAATTATCCGGTTCAGTATGAAGATAATGCACACTGTGTGTGGG 
TCATCACCACCACCGACCCGGAC7\AGGTCATCAAGCTTGCCTTTGAAGAGTTTGAGCTGGAGCGAGGC 
TATGACACCCTGACGGTTGGTGATGCTGGGAAGGTGGGAGACACCAGATCGGTCTTGTACGTGCTCAC 
GGGATCCAGTGTTCCTGACCTCATTGTGAGCATGAGCAACCAGATGTGGCTACATCTGCAGTCGGATG 
ATAGCATTGGCTCACCTGGGTTTAAAGCTGT.TTACCAAGAAATTGAAAAGGGAGGGTGTGGGGATCCT 

20 GGAATCCCCGCCTATGGGAAGCGGACGGGCAGCAGTTTCCTCCATGGAGATACACTCACCTTTGAATG 
CCCGGCGGCCTTTGAGCTGGTGGGGGAGAGAGTTATCACCTGTCAGCAGAACAATCAGTGGTCTGGCA 
ACAAGCCCAGCTGTGTATTTTCATGTTTCTTCAACTTTACGGCATCATCTGGGATTATTCTGTCACCA 
AATTATCCAGAGGAATATGGGAACAACATGAACTGTGTCTGGTTGATTATCTCGGAGCCAGGAAGTCG 
AATTCACCTAATCTTTAATGATTTTGATGTTGAGCCTCAATTTGACTTTCTCGCGGTCAAGGATGATG 

25 GCATTTCTGACATAACTGTCCTGGGTACTTTTTCTGGCAATGAAGTGCCTTCCCAGCTGGCCAGCAGT 
GGGCATATAGTTCGCTTGGAATTTCAGTCTGACCATTCCACTACTGGCAGAGGGTTCAACATCACTTA 
CACCACATTTGGTCAGAATGAGTGCCATGATCCTGGCATTCCTATAAACGGACGACGTTTTGGTGACA 
GGTTTCTACTCGGGAGCTCGGTTTCTTTCCACTGTGATGATGGCTTTGTCAAGACCCAGGGATCCGAG 
TCCATTACCTGCATACTGCAAGACGGGAACGTGGTCTGGAGCTCCACCGTGCCCCGCTGTGAAGCTCC 

30 ATGTGGTGGACATCTGACAGCGTCCAGCGGAGTCATTTTGCCTCCTGGATGGCCAGGATATTATAAGG 
ATTCTTTACATTGTGAATGGATAATTGAAGCAAAACCAGGCCACTCTATCAAAATAACTTTTGACAGA 
TTTCAGACAGAGGTCAATTATGACACCTTGGAGGTCAGAGATGGGCCAGCCAGTTCGTCCCCACTGAT 
CGGCGAGTACCACGGCACCCAGGCACCCCAGTTCCTCATCAGCACCGGGAACTTCATGTACCTGCTAT 
TCACCACTGACAACAGCCGCTCCAGCATCGGCTTCCTCATCCACTATGAGAGTGTGACGCTTGAGTCG 

35 GATTCCTGCCTGGACCCGGGCATCCCTGTG7U\CGGCCATCGCCACGGTGGAGACTTTGGCATCAGGTC 
CACAGTGACTTTCAGCTGTGACCCGGGGTACACACT71AGTGACGACGAGCCCCTCGTCTGTGAGAGGA 
ACCACCAGTGGAACCACGCCTTGCCCAGCTGCGACGCTCTATGTGGAGGCTACATCCAAGGG7\AGAGT 
GGAACAGTCCTTTCTCCTGGGTTTCCAGATTTTTATCCAAACTCTCTAAACTGCACGTGGACCATTGA 
AGTGTCTCATGGGAAAGGAGTTCAAATGATCTTTCACACCTTTCATCTTGAGAGTTCCCACGACTATT 

40 TACTGATCACAGAGGATGGAAGTTTTTCCGAGCCCGTTGCCAGGCTCACCGGGTCGGTGTTGCCTCAT 
ACGATCAAGGCAGGCCTGTTTGGAAACTTCACTGCCCAGCTTCGGTTTATATCAGACTTCTCAATTTC 
GTACGAGGGCTTCAATATCACATTTTCAGAATATGACCTGGAGCCATGTGATGATCCTGGAGTCCCTG 
CCTTCAGCCGAAGAATTGGTTTTCACTTTGGTGTGGGAGACTCTCTGACGTTTTCCTGCTTCCTGGGA 
TATCGTTTAGAAGGTGCCACCAAGCTTACCTGCCTGGGTGGGGGCCGCCGTGTGTGGAGTGCACCTCT 

45 GCCAAGGTGTGTGGCCGAATGTGGAGCAAGTGTC7VAAGGAAATGAAGGAACATTACTGTCTCCAAATT 
TTCCATCCAATTATGATAATAACCATGAGTGTATCTATAAAATAGAAACAGAAGCCGGCAAGGGCATC 
CACCTTAGAACACGAAGCTTCCAGCTGTTTGAAGGAGATACTCTAAAGGTATATGATGGAAAAGACAG 
TTCCTCACGTCCACTGGGCACGTTCACTAAAAATGAACTTCTGGGGCTGATCCTAAACAGCACATCCA 
ATCACCTGTGGCTAGAGTTCAACACCAATGGATCTGACACCGACCAAGGTTTTCAACTCACCTATACC 

50 AGTTTTGATCTGGTAAAATGTGAGGATCCGGGCATCCCTAACTACGGCTATAGGATCCGTGATGAAGG 
CCACTTTACCGACACTGTAGTTCTGTACAGTTGCAACCCGGGGTACGCCATGCATGGCAGCAACACCC 
TGACCTGTTTGAGTGGAGACAGGAGAGTGTGGGAC7y\ACCACTACCTTCGTGCATAGCGGAATGTGGT 
GGTCAGATCCATGCAGCCACATCAGGACGAATATTGTCCCCTGGCTATCCAGCTeCGTATGACAACAA 
CCTCCACTGCACCTGGATTATAGAGGCAGACCCAGGAAAGACCATTAGCCTCCATTTCATTGTTTTCG 

55 ACACGGAGATGGCTCACGACATCtTCAAGGTCTGGGACGGGCCGGTGGACAGTGACATCCTGCTGAAG 
GAGTGGAGTGGCTCCGCCCTTCCGGAGGACATCCACAGCACCTTCAACTCACTCACCCTGCAGTTCGA 
CAGCGACTTCTTCATCAGCAAGTCTGGCTTCTCCATCCAGTTCTCCACCTCAATTGCAGCCACCTGTA 
ACGATCCAGGTATGCCCCAAAATGGCACCCGCTATGGAGACAGCAGAGAGGCTGGAGACACCGTCACA 
TTCCAGTGTGACCCTGGCTATCAGCTCCAAGGACAAGCCAAAATCACCTGTGTGCAGCTGAAT7VACCG 

60 GTTCTTTTGGCAACCAGACCCTCCTACATGCATAGCTGCTTGTGGAGGGAATCTGACGGGCCCAGCAG 
GTGTTATTTTGTCACCCAACTACCCACAGCCGTATCCTCCTGGGAAGGAATGTGACTGGAGAGTAAAA 
GTGAACCCGGACTTTGTCATCGCCTTGATATTCAAAAGTTTCAACATGGAGCCCAGCTATGACTTCCT 
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ACACATCTATGAAGGGGAAGATTCCAACAGCCCCCTCATTGGGAGTTACCAGGGCTCTCAGGCCCCAG 
AAAGAATAGAGAGTAGCGGAAACAGCCTGTTTCTGGCATTTCGGAGTGATGCCTCCGTGGGCCTTTCA 
GGG TTCGCCATTGAATTTAAAGAGAAACCACGGGAAGCTTGTTTTGACCCAGGAAATATAATGAATGG 
GACAAGAGTTGGAACAGACTTCAAGCTTGGCTCCACCATCACCTACCAGTGTGACTCTGGCTATAAGA 
5 TTCTTGACCCCTCATCCATCACCTGTGTGATTGGGGCTGATGGGAAACCCTCCTGGGACCAAGTGCTG 
CCCTCCTGCAATGCTCCCTGTGGAGGCCAGTACACGGGATCAGAAGGGGTAGTTTTATCACCAAACTA 
CCCCCATAATTACACAGCTGGTCAAATATGCCTCTATTCCATCACGGTACCAAAGGAATTCGTGGTCT 
TTGGACAGTTTGCCTATTTCCAGACAGCCCTGAATGATTTGGCAGAATTATTTGATGGAACCCATGCA 
C^GGCCAGACTTCTCAGCTCACTCTCGGGGTCTCACTCAGGGGAAACATTGCCCTTGGCTACGTCAAA' 

10. TCAAATTCTGCTCCGATTCAGTGCAAAGAGCGGTGCCTCTGCCCGCGGCTTCCACTTCGTGTATCAAG 
CTGTTCCTCGTACCAGTGACACCCAATGCAGCTC'TGTCCCCGAGCCCAGATACGGAAGGAGAATTGGT 
TCTGAGTTTTCTGCCGGCTCCATCGTCCGATTCGAGTGCAACCCGGGATACCTGCTTCAGGGTTCCAC 
GGCGCTCCACTGCCAGTCCGTGCCCAACGCCTTGGCACAGTGGAACGACACGATCCCCAGCTGTGTGG 
*r ACCCTGCAGT gg caat tTCACTC AACGAAGAGG TACAATCC TGTCCCCCGGCTACCCTGAGCCAtAC 

15 GGAAACAACTTGAACTGTATATGGAAGATCATAGTTACGGAGGGCTCGGGAATTCAGATCCAAGTGAT 
CAGTTTTGCCACGGAGCAGAACTGGGACTCCCTTGAGATCCACGATGGTGGGGATGTGACCGCACCCA 
GACTGGGAAGCTTCTCAGGCACCACAGTACCGGCACTGCTGAACAGTACTTCCAACCAACTCTACCTG 
CATTTCCAGTCTGACATTAGTGTGGCAGCTGCTGGTTTCCACCTGGAATACAAAACTGTAGGTCTTGC 
TGCATGCCAAGAACCAGCCCTCCCCAGCAACAGCATCAATVATCGGAGATCGGTACATGGTGAACGACG 

20 TGCTCTCCTTCCAGTGCGAGCCCGGGTACACCCTGCAGGGCCGTTCCCACATTTCCTGTATGCCAGGG 
ACCGTTCGCCGTTGGAACTATCCGTCTCCCCTGTGCATTGCAACCTGTGGAGGGACGCTGAGCACCTT 
GGGTGGTGTGATCCTGAGCCCCGGCTTCCCAGGTTCTTACCCCAACAACTTAGACTGCACCTGGAGGA 
TCTCATTACCCATCGGCTATGGTGCACATATTCAGTTTCTGAATTTTTCTACCGAAGCTAATCATGAC 
TTCCTTGAAATTCAAAATGGACCTTACCACACCAGCCCCATGATTGGACAATTTAGCGGCACGGATCT 

25 CCCCGCGGCCCTGCTGAGCACAACGCATGAAACCCTCATCCACTTTTATAGTGACCATTCGCAAAACC 
GGCAAGGATTTAAACTTGCTTACCAAGCCTATGAATTACAGAACTGTCCAGATCCACCCCCATTTCAG 
AATGGGTACATGATCAACTCGGATTACAGCGTGGGGCAATCAGTATCTTTCGAGTGTTATCCTGGGTA 
CATTCTAATAGGCCATCCTCCG 
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SEQIDNO:2 

3V1 Nucleotide sequence 6145 bp 

1 TTTTAGGGAT GGTATGAATT TAATATTTTT TAG TAT T ACA ATATATTCT.T 

51 ATAAAAAAGG TCCAAGTGAA AAAGGCGATT GAGTTGAAGT CAAGAGGAGT 

101 CAAGATGCTG CCCAGCAAGG ATGGAAGCCA TAAAAACTCT GTCTGGCATA 

151 TGGAATAACA TCAACCATGT GACATCCGAA GAAGATACGT TCATTATGTA 

201 TCTGGGAAAA ..CCATGGCTTC AAGTGAAAAT TCAAGTGAGC CAAGGAGGTG 

•251 TTGCATTGGT CTCTGACATG TGTCCAGATC CTGGGATTCC AGAAAATGGT 

301 AGAAGAGCAG GTTCCGACTT CAGGGTTGGT GCAAATGTAC AGTTTTCATG 

351 TGAGGACAAT TACGTGCTCC AGC3GATCTAA AAGCATCACC TGTCAGAGAG 

4 01 TTACAGAGAC GCTCGCTGCT TGGAGTGACC ACAGGCCCAT CTGCCGAGCG 

• 4 51 AGAACATGTG GATCCAATCT GCGTGGGCCC AGCGGCGTCA TTACCTCCCC 

501 TAATTATCCG GTTCAGTATG AAGATAATGC ACACT.GTGTG TGGGTCATCA 

551 CCACCACCGA CCCGGACAAG GTCATCAAGC TTGCCTTNGA AGAGTTTGAG 

601 CTGGAGCGAG GCTATGACAC CCTNACGGTT GGTGATGCTG GGAAGGTGGG 

651 AGACACCAGA TCGGTCTTGT ANGTGCTCAC GGGATCCAGT GTT.CCTGACC 

701 TCATTGTGAG CATGAGCAAC CAGATGTGGC TACATCTGCA GTCGGATGAT 

7 51 AGCATTGGCT CACCTGGGTT TAAAGCTGTT TACCAAGAAA TTGAAAAGGG 

801 AGGGTGTGGG GATCCTGGAA TCCCCGCCTA TGGGAAGCGG AGGGGCAGCA 

851 GTTTCCTCCA TGGAGATACA CTCACCTTTG AATGCCCGGC GGCCTTTGAG 

901 CTGGTGGGGG AGAGAGTTAT CACCTGTCAG CAGAACAATC ■ AGTGGTCTGG 

951 CAACAAGCCC AGCTGTGTAT TTTCATGTTT CTTCAACTTT ACGGCATCAT 

1001 CTGGGATTAT TCTGTCACCA AATTATCCAG AGGAATATGG GAACAACATG 

1051 AACTGTGTCT GGTTGATTAT CTCGGAGCCA GGAAGTCGAA TTCACCTAAT 

1101 CTTTAATGAT TTTGATGTTG AGCCTCAATT TGACTTTCTC GCGGTCAAGG 

1151 ATGATGGCAT TTCTGACATA ACTGTCCTGG GTACTTTTTC TGGCAATGAA 

1201 GTGCCTTCCC AGCTGGCCAG CAGTGGGCAT ATAGTTCGCT TGGAATTTCA 

1251 GTCTGACCAT TCCACTACTG GCAGAGGGTT CAACATCACT TACACCACAT 

1301 TTGGTCAGAA TGAGTGCCAT GATCCTGGCA TTCCTATAAA CGGACGACGT 

1351 TTTGGTGACA GGTTTCTACT CGGGAGCTCG GTTTCTTTCC ACTGTGATGA 

14 01 TGGCTTTGTC AAGACCCAGG GATCCGAGTC CATTACCTGC ATACTGCAAG 

14 51 ACGGGAACGT GGTCTGGAGC TCCACCGTGC CCCGCTGTGA AGCTCCATGT 

1501 GGTGGACATC TGACAGCGTC CAGCGGAGTC ATTTTGCCTC CTGGATGGCC 

1551 AGGATATTAT AAGGATTCTT TACATTGTGA ATGGATAATT GAAGCAAAAC 

1601 CAGGCCACTC TAT C AAAAT A ACTTTTGACA GATTTCAGAC AGAGGT CAAT 

1651 TATGACACCT TGGAGGTCAG AGATGGGCCA GCCAGTTCGT CCCCACTGAT 

1701 CGGCGAGTAC CACGGCACCC AGGCACCCCA GTTCCTCATC AGCACCGGGA 

17 51 ACTTCATGTA CCTGCTATT^.-AGCACTGACA ACAGCCGCTC CAGCATCGGC 

1801 TTCCTCATCC ACTATGAGAG TGTGACGCTT GAGTCGGATT CCTGCCTGGA 

1851 CCCGGGCATC CCTGTGAACG GCCATCGCCA CGGTGGAGAC TTTGGCATCA 

1901 GGTCCACAGT GACTTTCAGC TGTGACCCGG GGTACACACT AAGTGACGAC 

1951 GAGCCCCTCG TCTGTGAGAG GAACCACCAG TGGAACCACG CCTTGCCCAG 

2001 CTGCGACGCT CTATGTGGAG GCTACATCCA AGGGAAGAGT GGAACAGTCC 

2051 TTTCTCCTGG GTTTCCAGAT TTTTATCCAA ACTCTCTAAA CTGCACGTGG 

2101 ACCATTGAAG TGTCTCATGG GAAAGGAGTT CAAATGATCT TTCACACCTT 

2151 TCATCTTGAG AGTTCCCACG ACTATTTACT GATCACAGAG GATGGAAGTT 

2201 TTTCCGAGCC CGTTGCCAGG CTCACCGGGT CGGTGTTGCC T CAT AC GAT C 

2251 AAGGCAGGCC TGTTNGGAAA CTTCACTGCC CAGCTTCGGT T TAT AT C AGA 

2301 CTTCTCAATT TCGTACGAGG GCTTCAATAT CACATTTTCA GAATATGACC 

2351 TGGAGCCATG TGATGATCCT GGAGTCCCTG CCTTCAGCCG AAGAATTGGT 

24 01 TTTCACTTTG GTGTGGGAGA CTCTCTGACG TTTTCCTGCT TCCTGGGATA 

2451 TCGTTTAGAA GGTGCCACCA AGCTTACCTG CCTGGGTGGG GGCCGCCGTG 

2501 TGTGGAGTGC ACCTCTGCCA AGGTGTGTGG CCGAATGTGG AGCAAGTGTC 

2551 AAAGGAAATG AAGGAACATT ACTGTCTCCA AATTTTCCAT CCAATTATGA 

2601 TAATAACCAT GAGTGTATCT ATAAAATAGA AACAGAAGCC GGCAAGGGCA 

2651 TCCACCTTAG AACACGAAGC TTCCAGCTGT TTGAAGGAGA TACTCTAAAG 

2701 GTATATGATG GAAAAGACAG TTCCTCACGT CCACTGGGCA CGTTCACTAA 

2751 AAATGAACTT CTGGGGCTGA TCCTAAACAG CACATCCAAT CACCTGTGGC 

2801 TAGAGTTCAA CACCAATGGA TCTGACACCG ACCAAGGTTT TCAACTCACC 

2851 TATACCAGTT TTGATCTGGT AAAATGTGAG GATCCGGGCA TCCCTAACTA 

2901 CGGCTATAGG ATCCGTGATG AAGGCCACTT TACCGACACT GTAGTTCTGT 

2951 ACAGTTGCAA CCCGGGGTAC GCCATGCATG GCAGCAACAC CCTGACCTGT 

3001 TTGAGTGGAG ACAGGAGAGT GTGGGACAAA CCACTACCTT CGTGCATAGC 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



3051 

3101 

3151 

3201 

3251 

3301 

3351 

3401 

3451 

3501 

3551 

3601 

3651 

3701 

3751 

3801 

3851 

3901 

3951 

4001 

4051 

4101 

4151 

4201 

4251 

4301 

4351 

4401 

4451 

4501 

4551 

4601' 

4651 

4701 

4751 

4801 

4851 

4 901 

4*951 

5001 

5051 

5101 

5151 

5201 

5251 

5301 

5351 

5401 

5451 

5501 

5551 

5601 

5651 

5701 

5751 

5801 

5851 

5901 

5951 

6001 

6051 

6101 



GGAATGTGGT 
CTGGCTATCC 
GAGGCAGACC 
GGAGATGGCT 
ACATCCTGCT 
AGCACCTTCA 
CAAGTCTGGC 
ACGATCCAGG 
GCTGGAGACA 
ACAAGCCAAA 
CAGACCCTCC 
GCAGGTGTTA 
GGAATGTGAC 
TATTCAAAAG 
GAAGGGGAAG 
GGCCCCAGAA 
GGAGTGATGC 
AAACCACGGG 
AGTTGGAACA 
CTGGCTATAA 
GATGGGAAAC 
TGGAGGCCAG 
CCCATAATTA 
AAGGAATTCG 
TGATTTGGCA 
GCTCACTCTC 
AATCAAATTC 
CTTCCACTTC 
GCTCTGTCCC 
GCCGGCTCCA 
TTCCACGGCG 
ACGACACGAT 
CGAAGAGGTA 
CTTGAACTGT 
TCCAAGTGAT 
CACGATGGTG 
CACAGTACCG 
TCCAGTCTGA 
ACTGTAGGTC 
CAAAATCGGA 
AGCCCGGGTA 
ACCGTTCGCC 
AGGGACGCTG 
GTTCTTACCC 
GGCTATGGTG 
TGACTTCCTT 
GACAATTTAG 
GAAACCCTCA 
TAAACTTGCT 
CTAAATACAC 
CTGGTTCCAA 
TCTGGTTAGT 
CAGACGGAAC 
ATTTATCAAG 
AGATGTGTGG 
CATTCAGGGA 
CTCACGCCTG 
AGGTCAGGAG 
ACTAAAAATA 
CAGCTACTCG 
AGCTTGCAGT 
AGCCAGACTC 



GGTCAGATCC 
AGCTCCGTAT 
C AG G AAAGAC 
CACGACATCC 
GAAGGAGTGG 
ACTCACTCAC 
TTCTCCATCC 
' TATGCCCCAA 
CCGTCACATT 
ATCACCTGTG 
TACATGCATA 
TTTTGTCACC 
TGGAGAGTAA 
TTTCAACATG 
ATTCCAACAG 
AGAATAGAGA 
CTCCGTGGGC 
AAGCTTGTTT 
GACTTCAAGC 
GATTCTTGAC 
CCTCCTGGGA 
TACACGGGAT 
CACAGCTGGT 
TGGTCTTTGG 
GAATTATTTG 
GGGGTCTCAC 
TGCTCCGATT 
GTGTATCAAG 
CGAGCCCAGA 
TCGTCCGATT 
CTCCACTGCC. 
CCCCAGCTGT 
CAATCCTGTC 
ATATGGAAGA 
CAGTTTTGCC 
GGGATGTGAC 
GCACTGCTGA 
CATTAGTGTG 
TTGCTGCATG 
GATCGGTACA 
CACCCTGCAG 
GTTGGAACTA 
AGCACCTTGG 
CAACAACTTA 
CACATATTCA 
GAAATTCAAA 
CGGCACGGAT 
TCCACTTTTA 
TACCAAGNTA 
TTCTTACATG 
GCTTGTACGA 
AGTGGAACAC 
CACCAGTGGG 
ACGGGAATTG 
GAGACCGGAG 
TTCAAGTTTT 
TAATCCCAGC 
ATCGAGACCA 
CCAAAAATTA 
GGAGGCTGAG 
GAGGAGAGAT 
CATCTCGAAA 



ATGCAGCCAC 

GACAACAACC 

CATTAGCCTC 

TCAAGGTCTG 

AGTGGCTCCG 

CCTGCAGTTC 

AGTTCTCCAC 

AATGGCACCC 

CCAGTGTGAC 

TGCAGCTGAA 

GCTGCTTGTG 

CAACTACCCA 

AAGTGAACCC 

GAGCCCAGCT 

CCCCCTCATT 

GTAGCGGAAA 

CTTTCAGGGT 

TGACCCAGGA 

TTGGCTCCAC 

CCCTCATCCA 

CCAAGTGCTG 

CAGAAGGGGT 

CAAATATGCC 

ACAGTTTGCC 

ATGGAACCCA 

TCAGGGGAAA 

CAGTGCAAAG 

CTGTTCCTCG 

TACGGAAGGA 

CGAGTGCAAC 

AGTCCGTGCC 

GTGGTACCCT 

CCCCGGCTAC 

TCATAGTTAC 

ACGGAGCAGA 

CGCACCCAGA 

ACAGTACTTC 

GCAGCTGCTG 

CCAAGAACCA 

TGGTGAACGA 

GGCCGTTCCC 

TCCGTCTCCC 

GTGGTGTGAT 

GACTGCACCT 

GTTTCTGAAT 

ATGGACCTTA 

CTCCCCGCGG 

TAGTGACCAT 

TGGAACAACA 

TAAATTGTAT 

GTGGAATAAT 

TTGTTGTTTT 

TTCGCCTTTT 

CAATGGAGAA 

TTTTATTGTG 

TAAAGATAAT 

ACTTTGGAAG 

TCCTGGCTAA 

GCCGGGCATA 

GCAGGANAGT 

CGCGCCACTG 

AAAAAAAAAA 



ATCAGGACGA 

TCCACTGCAC 

CATTTCATTG 

GGACGGGCCG 

CCCTTCCGGA 

GACAGCGACT 

CTCAATTGCA 

GCTATGGAGA 

CCTGGCTATC 

TAACCGGTTC 

GAGGGAATCT 

CAGCCGTATC 

GGACTTTGTC 

ATGACTTCCT 

GGGAGTTACC 

CAGCCTGTTT 

TCGCCATTGA 

AATATAATGA 

CATCACCTAC 

TCACCTGTGT 

CCCTCCTGCA 

AGTTTTATCA 

TCTATTCCAT 

TATTTCCAGA 

TGCACAGGCC 

CATTGCCCTT 

AGCGGTGCCT 

TACCAGTGAC 

GAATTGGTTC 

CCGGGATACC 

CAACGCCTTG 

GGAGTGGCAA 

CCTGAGCCAT 

GGAGGGCTCG 

ACTGGGACTC 

CTGGGAAGCT 

CAACCAACTC 

GTTTCCACCT 

GCC*CTCCCCA 

CGTGCTCTCC 

ACATTTCCTG 

CTGTGCATTG 

CCTGAGCCCC 

GGAGGATCTC 

TTTTCTACCG 

CCACACCAGC 

CCCTGCTGAG 

TCGCAAAACC 

ACGAGAACCG 

TTAAGTATAA 

TTTTTGGTGG 

TGAAAACAGA 

CTGCTGCCCA 

AGAGTAATTC 

ACTCAATTCA 

TTGGCGGCCG 

GCCGAGGCGG 

CACGGTGAAA 

GTGGCGGGCG 

GGCGTGAACC 

CACTCCAGCC 

AAAAAAAAAA 



ATATTGTCCC 
CTGGATTATA 
TTTTCGACAC 
GTGGACAGTG 
GGACATCCAC 
■ TCTTCATCAG 
GCCACCTGTA 
CAGCAGAGAG 
AGCTCCAAGG 
TTTTGGCAAC 
GACGGGCCCA 
CTCCTGGGAA 
ATCGCCTTGA 
ACACATCTAT 
AGGGCTCTCA 
CTGGCATTTC 
ATTTAAAGAG 
ATGGGACAAG 
CAGTGTGACT 
GATTGGGGCT 
ATGCTCCCTG 
CCAAACTACC 
CACGGTACCA 
CAGCCCTGAA 
AGACTTCTCA 
GGCTACGTCA 
CTGCCCGCGG 
ACCCAATGCA 
TGAGTTTTCT 
TGCTTCAGGG 
GCAGAGTGGA 
TTTCACTCAA 
ACGGAAACAA 
GGAATTCAGA 
CCTTGAGATC 
TCTCAGGCAC 
TACCTGCATT 
GGAATACAAA 
GCAACAGCAT 
TTCCAGTGCG 
TATGCCAGGG 
CAACCTGTGG 
GGCTTCCCAG 
ATTACCCATC 
AAGCTAATCA 
CCCATGATTG 
CACAACGCAT 
GGCAAGGATT 
AAACCCAAAT 
ATCTCCCTAA 
AATGTTGGTT 
GGTAAGGACA 
GACAGAGCCG 
ACGCAGAGCC 
GTCTCCCCAG 
GGCGCGGTGG 
GCGGATCACG 
CCCCGTCTCT 
CCTGTAGTCC 
CGGGAGGCGG 
TGGGCGACAG 
AAAAA 
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SEQ ID NO:3 



5C-3V2 Nucleotide sequence 6409 bp 

1 TTTTAGGGAT GGTATGAATT TAATATTTTT TAGTATTACA ATATATTCTT 

51 ATAAAAAAGG TCCAAGTGAA AAAGGCGATT GAGTTGAAGT CAAGAGGAGT 

101 CAAGATGCTG CCCAGCAAGG ATGGAAGCCA TAAAAACTCT. GTCTGGCATA 

151 TGGAATAACA . TCAACCATGT GACATCCGAA GAAGATACGT TCATTATGTA 

201 TCTGGGAAAA CCATGGCTTC AAGTGAAAAT TCAAGTGAGC CAAGGAGGTG 

251 TTGCATTGGT CTCTGACATG TGTCCAGATC CTGGGATTCC AGAAAATGGT 

301 AGAAGAGCAG GTTCCGACTT CAGGGTTGGT GCAAATGTAC AGTTTTCATG 

351 TGAGGACAAT TACGTGCTCC AGGGATCTAA AAGCATCACC TGTCAGAGAG 

4 01 TTACAGAGAC GCTCGCTGCT TGGAGTGACC ACAGGCCCAT CTGCCGAGCG 

4 51 AGAACATGTG GATCCAATCT GCGTGGGCCC AGCGGCGTCA TTACCTCCCC 

501 TAATTATCCG GTTCAGTATG AAGATAATGC ACACTGTGTG TGGGTCATCA 

551 CCACCACCGA CCCGGACAAG GTCATCAAGC TTGCCTTNGA AGAGTTTGAG 

601 CTGGAGCGAG GCTATGACAC CCTNACGGTT GGTGATGCTG GGAAGGTGGG 

651 AGACACCAGA TCGGTCTTGT ANGTGCTCAC GGGATCCAGT GTTCCTGACC 

7 01 TCATTGTGAG CATGAGCAAC CAGATGTGGC TACATCTGCA GTCGGATGAT 

7 51 AGCATTGGCT CACCTGGGTT TAAAGCTGTT TACCAAGAAA TTGAAAAGGG 

801 AGGGTGTGGG GATCCTGGAA TCCCCGCCTA TGGGAAGCGG ACGGGCAGCA 

• 851 GTTTCCTCCA TGGAGATACA CTCACCTTTG AATGCCCGGC GGCCTTTGAG 

901 CTGGTGGGGG AGAGAGTTAT CACCTGTCAG CAGAACAATC AGTGGTCTGG 

951 CAACAAGCCC AGCTGTGTAT TTTCATGTTT CTTCAACTTT ACGGCATCAT 

1001 CTGGGATTAT TCTGTCACCA AATTATCCAG AGGAATATGG GAACAACATG 

1051 AACTGTGTCT GGTTGATTAT CTCGGAGCCA GGAAGTCGAA TTCACCTAAT 

1101 CTTTAATGAT TTTGATGTTG AGCCTCAATT TGACTTTCTC GCGGTCAAGG 

1151 ATGATGGCAT TTCTGACATA ACTGTCCTGG GTACTTTTTC TGGCAATGAA 

1201 GTGCCTTCCC AGCTGGCCAG CAGTGGGCAT ATAGTTCGCT TGGAATTTCA 

1251 GTCTGACCAT TCCACTACTG GCAGAGGGTT CAACATCACT TACACCACAT 

1301 TTGGTCAGAA TGAGTGCCAT GATCCTGGCA TTCCTATAAA CGGACGACGT 

1351 TTTGGTGACA GGTTTCTACT CGGGAGCTCG GTTTCTTTCC ACTGTGATGA 

14 01 TGGCTTTGTC AAGACCCAGG GATCCGAGTC CATTACCTGC ATACTGCAAG 

14 51 ACGGGAACGT GGTCTGGAGC TCCACCGTGC CCCGCTGTGA AGCTCCATGT 

1501 GGTGGACATC TGACAGCGTC CAGCGGAGTC ATTTTGCCTC CTGGATGGCC 

1551 AG GATAT TAT AAGGATTCTT TACATTGTGA ATGGATAATT GAAGCAAAAC 

1601 CAGGCCACTC TATCAAAATA ACTTTTGACA GATTTCAGAC AGAGGTCAAT 

1651 TATGACACCT TGGAGGTCAG AGATGGGCCA GCCAGTTCGT CCCCACTGAT 

1701 CGGCGAG'TAC CACGGCACCC AGGCACCCCA GTTCCTCATC AGCACCGGGA 

1751 ACTTCATGTA CCTGCTATTC ACCACTGACA ACAGCCGCTC CAGCATCGGC 

1801 TTCCTCATCC ACTATGAGAG TGTGACGCTT GAGTCGGATT CCTGCCTGGA 

1851 CCCGGGCATC CCTGTGAACG GCCATCGCCA •> CGGTGGAGAC TTTGGCATCA 

1901 GGTCCACAGT GACTTTCAGC TGTGACCCGG GGTACACACT AAGTGACGAC 

1951 GAGCCCCTCG TCTGTGAGAG GAACCACCAG TGGAACCACG CCTTGCCCAG 

2001 CTGCGACGCT CTATGTGGAG GCTACATCCA AGGGAAGAGT GGAACAGTCC 

2051 TTTCTCCTGG GTTTCCAGAT TTTTATCCAA ACTCTCTAAA CTGCACGTGG 

2101 ACCATTGAAG TGTCTCATGG GAAAGGAGTT CAAATGATCT TTCACACCTT 

2151 TCATCTTGAG AGTTCCCACG ACTATTTACT GATCACAGAG GATGGAAGTT 

2201 TTTCCGAGCQ CGTTGCCAGG CTCACCGGGT CGGTGTTGCC TCATACGATC 

2251 AAGGCAGGCC TGTTNGGAAA CTTCACTGCC CAGCTTCGGT TTATATCAGA 

2301 CTTCTCAATT TCGTACGAGG GCTTCAATAT CACATTTTCA GAATATGACC 

2351 TGGAGCCATG TGATGATCCT GGAGTCCCTG CCTTCAGCCG AAGAATTGGT 

24 01 TTTCACTTTG GTGTGGGAGA CTCTCTGACG TTTTCCTGCT TCCTGGGATA 

24 51 TCGTTTAGAA GGTGCCACCA AGCTTACCTG CCTGGGTGGG GGCCGCCGTG 

2501 TGTGGAGTGC ACCTCTGCCA AGGTGTGTGG CCGAATGTGG AGCAAGTGTC 

2551 AAAGGAAATG AAGGAACATT ACTGTCTCCA AATTTTCCAT CCAATTATGA 

2 601 TAATAACCAT GAGTGTATCT ATAAAATAGA AACAGAAGCC GGCAAGGGCA 

2651 TCCACCTTAG AACACGAAGC TTCCAGCTGT TTGAAGGAGA TACTCTAAAG 

2701 GTATATGATG GAAAAGACAG T.TCCTCACGT CCACTGGGCA CGTTCACTAA 

2751 AAATGAACTT CTGGGGCTGA TCCTAAACAG CACATCCAAT CACCTGTGGC 

2801 TAGAGTTCAA CACCAATGGA TCTGACACCG ACCAAGGTTT TCAACTCACC 

2851 TATACCAGTT TTGATCTGGT AAAATGTGAG GATCCGGGCA TCCCTAACTA 

2901 CGGCTATAGG ATCCGTGATG AAGGCCACTT TACCGACACT GTAGTTCTGT 

2951 ACAGTTGCAA CCCGGGGTAC GCCATGCATG GCAGCAACAC CCTGACCTGT 
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3001 TTGAGTGGAG ACAGGAGAGT 

3051 GGAATGTGGT GGTCAGATCC 

3101 CTGGCTATCC AGCTCCGTAT 

3151 GAGGCAGACC C AG G AAAG AC 

5 3201 GGAGATGGCT CACGACATCC 

3251 ACATCCTGCT GAAGGAGTGG 

3301 AGCACCTTCA ACTCACTCAC 

3351 CAAGTCTGGC TTCTCCATCC 

34 01 ACGATCCAGG TATGCCCCAA 

10 34 51 GCTGGAGACA CCGTCACATT 

3501 ACAAGCCAAA ATCACCTGTG 

3551 CAGACCCTCC TACATGCATA 

3601 GCAGGTGTTA TTTTGTCACC 

3 651 GGAATGTGAC TGGAGAGTAA 

15 3701 TATTCAAAAG TTTCAACATG 

37 51 GAAGGGGAAG ATTCCAACAG 

3801 GGCCCCAGAA AGAATAGAGA 

3851 GGAGTGATGC CTCCGTGGGC 

3 901 AAACCACGGG AAGCTTGTTT 
20 3951 AGTTGGAACA GACTTCAAGC 

4 001 CTGGCTATAA GATTCTTGAC 
4 051 GATGGGAAAC CCTCCTGGGA 
4101' TGGAGGCCAG TACACGGGAT 

'4151 CCCATAATTA CACAGCTGGT 
25 4 201 * AAGGAAT TCG TGGTCTTTGG 

4 251 TGATTTGGCA GAATTATTTG 

4 301 GCTCACTCTC GGGGTCTCAC 

4 351 AATCAAATTC TGCTCCGATT 

4 4 01 CTTCCACTTC GTGTATCAAG 

30 4 4 51 GCTCTGTCCC CGAGCCCAGA 

4 501 GCCGGCTCCA TCGTCCGATT 

4 551 TTCCACGGCG CTCCACTGCC 

4 601 AC G AC AC GAT CCCCAGCTGT 

4 651 CGAAGAGGTA CAATCCTGTC 

35 4701 CTTGAACTGT AT AT GGAAG A 

4751 TCCAAGTGAT CAGTTTTGCC 

4 801 CACGATGGTG GGGATGTGAC 

4 851 CACAGTACCG GCACTGCTGA 

4 901 TCCAGTCTGA CATTAGTGTG 
40 4 951 ACTGTAGGTC TTGCTGCATG 

5001 CAAAATCGGA GATCGGTACA 

5051 AGCCCGGGTA CACCCTGCAG 

5101 ACCGTTCGCC GTTGGAACTA 

5151 AGGGACGCTG AGCACCTTGG 

45 5201 GTTCTTACCC CAACAACTTA 

5251 GGCTATGGTG CACATATTCA 

■ 5301 TGACTTCCTT GAAATTCAAA 

5351 GACAATTTAG CGGCACGGAT 

5401 GAAACCCTCA TCCACTTTTA 

50 5451 TAAACTTGCT TACCAAGCCT 

5501 CATT TCAGAA TGGGTACATG 

5551 GTATCTTTCG AGTGTTATCC 

5601 CACTTGTCAG CATGGGATCA 

5 651 GTGATGCCCC TTGTGGGTAC 
55 5701 TCCCCTGGCT TT.CCTGATGA 

5751 CATCACGGTG CCTCCAGGGC 

5801 AGACGGAAGC TGTCAACGAT 

5851 AACTCACCCC AGCTGGGAGT 

5901 GTATAGCTCC ACCAACCAAG 

60 5951 ATGGAGGCTT CTTTGTCCTC 

6001 TTAGTTAAGA CTGAGAATTC 

6051 GCCTTGTTTC CAGCTGAAGT 

6101 CTTTTGCACT GGAGGCCAGC 



GTGGGACAAA CCACTACCTT CGTGCATAGC 
ATGCAGCCAC ATCAGGACGA ATATTGTCCC 
GACAACAACC TCCACTGCAC . CTGGATTATA 
CATTAGCCTC CATTTCATTG TTTTCGACAC 
TCAAGGTCTG GGACGGGCCG GTGGACAGTG 
AGTGGCTCCG CCCTTCCGGA GGACATCCAC 
CCTGCAGTTC GACAGCGACT TCTTCATCAG 
AGTTCTCCAC CTCAATTGCA GCCACCTGTA 
AATGGCACCC GCTATGGAGA CAGCAGAGAG 
CCAGTGTGAC CCTGGCTATC AGCTCCAAGG 
TGCAGCTGAA TAACCGGTTC TTTTGGCAAC 
GCTGCTTGTG GAGGGAATCT GACGGGCCCA 
CAACTACCCA CAGCCGTATC CTCCTGGGAA 
AAGTGAACCC GGACTTTGTC ATCGCCTTGA 
GAGCCCAGCT ATGACTTCCT ACACATCTAT 
CCCCCTCATT GGGAGTTACC AGGGCTCTCA 
GTAGCGGAAA CAGCCTGTTT CTGGCATTTC 
CTTTCAGGGT TCGCCATTGA AT T T AAAG AG 
TGACCCAGGA AATATAATGA ATGGGACAAG 
TTGGCTCCAC CATCACCTAC CAGTGTGACT 
CCCTCATCCA TCACCTGTGT GATTGGGGCT 
CCAAGTGCTG CCCTCCTGCA ATGCTCCCTG 
CAGAAGGGGT AGTTTTATCA CCAAACTACC 
CAAATATGCC TCTATTCCAT CACGGTACCA 
ACAGTTTGCC TATTTCCAGA CAGCCCTGAA 
ATGGAACCCA TGCACAGGCC AGACTTCTCA 
TCAGGGGAAA CATTGCCCTT GGCTACGTCA 
CAGTGCAAAG AGCGGTGCCT CTGCCCGCGG 
CTGTTCCTCG TACCAGTGAC ACCCAATGCA 
TACGGAAGGA GAATTGGTTC TGAGTTTTCT 
CGAGTGCAAC CCGGGATACC TGCTTCAGGG 
AGTCCGTGCC CAACGCCTTG GCACAGTGGA 
GTGGTACCCT GCAGTGGCAA TTTCACTCAA 
CCCCGGCTAC CCTGAGCCAT ACGGAAACAA 
TCATAGTTAC GGAGGGCTCG GGAATTCAGA 
ACGGAGCAGA ACTGGGACTC CCTTGAGATC 
CGCACCCAGA CTGGGAAGCT TCTCAGGCAC 
ACAGTACTTC CAACCAACTC TACCTGCATT 
GCAGCTGCTG GTTTCCACCT GGAATACAAA 
CCAAGAACCA GCCCTCCCCA GCAACAGCAT 
TGGTGAACGA CGTGCTCTCC TTCCAGTGCG 
GGCCGTTCCC ACATTTCCTG TATGCCAGGG 
TCCGTCTCCC CTGTGCATTG CAACCTGTGG 
GTGGTGTGAT CCTGAGCCCC GGCTTCCCAG 
GACTGCACCT GGAGGATCTC AT TACCCATC 
GTTTCTGAAT TTTTCTACCG AAGCTAATCA 
ATGGACCTTA CCACACCAGC CCCATGATTG 
CTCCCCGCGG CCCTGCTGAG CACAACGCAT 
TAGTGACCAT TCGCAAAACC GGCAAGGATT 
ATGAATTACA GAACTGTCCA GATCCACCCC 
ATCAACTCGG ATTACAGCGT GGGGCAATCA 
TGGGTACATT CTAATAGGCC ATCCTGTCCT 
ACAGAAACTG GAACTACCCT TTTCCAAGAT 
AACGTAACTT CTCAGAACGG CACCATCTAC 
GTATCCGATC CTGAAGGACT GCATTTGGCT 
ACGGAGTTTA CATCAACTTC ACCCTGTTAC 
TACATTGCTG TTTGGGACGG TCCCGATCAG 
TTTCAGTGGC AACACAGCCC TCGAAACGGC 
TCCTGCTCAA GTTCCACAGC GACTTTTCAA 
AATTTCCACG GTCAGTTGAT TTTCACTCCG 
CATGTGGTGT TTACTGCAGT GTTGTCCCAC 
TTCTTGATTC AGCCGAGGGC GTGTATGATT 
GTTTCCTGTG GTCCTTTTTT TGTTTAATGA 
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6151 TGTCTTTATT ATTTCACATC 

6201 TGTATCCTAA GTGAAACTCT 

6251 CCTTTTATAG ATTTACTCAT 

6301 TGTCATGTAA CTATAAATGG 

6351 ATGCACTCTA AACGATATGT 

64 01 CACGAACAC 



GTATCCAGCT TGGATTTATT CCAAGATACA 
AAGATGAAGA CCATTGAAAG AGATTTGGTA 
CCCTGTCTCA AGATAAGGTG TTATAGCAAA 
TGTGAAAGCA AACCTCCAAT AATCCTGGGA 
AGAACATCTG TCAATCNATC GCTTATCTCT 
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SEQ ID NO:4 



5C3-3V3 Nucleotide sequence 5667 bp 

1 TTTTAGGGAT GGTATGAATT TAATATTTTT TAG TATTACA ATATATTCTT 

5 51 ATAAAAAAGG TCCAAGTGAA AAAGGCGATT GAGTTGAAGT CAAGAGGAGT 

101 CAAGATGCTG CCCAGCAAGG ATGGAAGCCA TAAAAACTCT GTCTGGCATA 

151 TGGAATAACA . TCAACCATGT GACATCCGAA GAAGATACGT TCATTATGTA 

201 TCTGGGAAAA CCATGGCTTC AAGTGAAAAT TCAAGTGAGC CAAGGAGGTG 

251 TTGCATTGGT CTCTGACATG TGTCCAGATC CTGGGATTCC AGAAAATGGT 

10 301 AGAAGAGCAG GTTCCGACTT CAGGGTTGGT GCAAATGTAC AGTTTTCATG 

351 TGAGGACAAT TACGTGCTCC AGGGATCTAA AAGCATCACC TGTCAGAGAG 

401 T T AC AG AG AC GCTCGCTGCT TGGAGTGACC ACAGGCCCAT CTGCCGAGCG 

4 51 AGAACATGTG GATCCAATCT GCGTGGGCCC AGCGGCGTCA TTACCTCCCC 

501 TAATTATCCG GTTCAGTATG AAGATAATGC ACACTGTGTG TGGGTCATCA 

15 551 CCACCACCGA CCCGGACAAG GTCATCAAGC TTGCCTTNGA AGAGTTTGAG 

601 CTGGAGCGAG GCTATGACAC CCTNACGGTT GGTGATGCTG GGAAGGTGGG 

651 AGACACCAGA TCGGTCTTGT ANGTGCTCAC GGGATCCAGT GTTCCTGACC 

701 TCATTGTGAG CATGAGCAAC CAGATGTGGC TACATCTGCA GTCGGATGAT 

751 AGCATTGGCT CACCTGGGTT TAAAGCTGTT TACCAAGAAA TTGAAAAGGG 

20 801 AGGGTGTGGG GATCCTGGAA TCCCCGCCTA TGGGAAGCGG ACGGGCAGCA 

851 GTTTCCTCCA TGGAGATACA CTCACCTTTG AATGCCCGGC GGCCTTTGAG 

901 CTGGTGGGGG AG AGAGT TAT CACCTGTCAG CAGAACAATC AGTGGTCTGG 

951 CAACAAGCCC AGCTGTGTAT TTTCATGTTT CTTCAACTTT ACGGCATCAT 

1001 CTGGGATTAT TCTGTCACCA AATTATCCAG AGGAATATGG GAACAACATG 

25 1051 AACTGTGTCT GGTTGATTAT CTCGGAGCCA GGAAGTCGAA TTCACCTAAT 

1101 CTTTAATGAT TTTGATGTTG AGCCTCAATT TGACTTTCTC GCGGTCAAGG 

1151 ATGATGGCAT TTCTGACATA ACTGTCCTGG GTACTTTTTC TGGCAATGAA 

1201 GTGCCTTCCC AGCTGGCCAG CAGTGGGCAT ATAGTTCGCT TGGAATTTCA 

1251 GTCTGACCAT TCCACTACTG GCAGAGGGTT CAACATCACT TACACCACAT 

30 1301 TTGGTCAGAA TGAGTGCCAT GATCCTGGCA TTCCTATAAA CGGACGACGT 

1351 TTTGGTGACA GGTTTCTACT CGGGAGCTCG GTTTCTTTCC ACTGTGATGA 

14 01 TGGCTTTGTC AAGACCCAGG GATCCGAGTC CATTACCTGC ATACTGCAAG 

1451 ACGGGAACGT GGTCTGGAGC TCCACCGTGC CCCGCTGTGA AGCTCCATGT 

1501 GGTGGACATC TGACAGCGTC CAGCGGAGTC ATTTTGCCTC CTGGATGGCC 

35 1551 AGGATATTAT AAGGATTCTT TACATTGTGA ATGGATAATT GAAGCAAAAC 

1601 CAGGCCACTC TATCAAAATA ACTTTTGACA GATTTCAGAC AGAGGTCAAT 

1651 TAT G AC ACCT TGGAGGTCAG AGATGGGCCA GCCAGTTCGT CCCCACTGAT 

1701 CGGCGAG TAC CACGGCACCC AGGCACCCCA GTTCCTCATC AGCACCGGGA 

17 51 ACTTCATGTA CCTGCTATTC ACCACTGACA ACAGCCGCTC CAGCATCGGC 

40 1801 TTCCTCATCC. ACTATGAGAG TGTGACGCTT GAGTCGGATT CCTGCCTGGA 

1851 CCCGGGCATC CCTGTGAACG GCCATCGCCA CGGTGGAGAC TTTGGCATCA 

1901 GGTCCACAGT GACTTTCAGC TGTGACCCGG GGTACACACT AAGTGACGAC 

1951 GAGCCCCTCG TCTGTGAGAG GAACCACCAG TGGAACCACG CCTTGCCCAG 

2001 CTGCGACGCT CTATGTGGAG GCTACATCCA AGGGAAGAGT GGAACAGTCC 

45 2051 TTTCTCCTGG GTTTCCAGAT TTTTATCCAA ACTCTCTAAA CTGCACGTGG 

2101 ACCATTGAAG TGTCTCATGG GAAAGGAGTT CAAATGATCT TTCACACCTT 

2151 TCATCTTGAG . AGTTCCCACG ACTATTTACT GATCACAGAG GATGGAAGTT 

2201 TTTCCGAGCC CGTTGCCAGG CTCACCGGGT CGGTGTTGCC TCATACGATC 

2251 AAGGCAGGCC TGTTNGGAAA CTTCACTGCC CAGCTTCGGT TTATATCAGA 

50 2301 CTTCTCAATT TCGTACGAGG GCTTCAATAT CACATTTTCA GAATATGACC 

2351 TGGAGCCATG TGATGATCCT GGAGTCCCTG CCTTCAGCCG AAGAATTGGT 

2401 TTTCACTTTG GTGTGGGAGA CTCTCTGACG TTTTCCTGCT TCCTGGGATA 

24 51 TCGTTTAGAA GGTGCCACCA AGCTTACCTG CCTGGGTGGG GGCCGCCGTG 

2501 TGTGGAGTGC ACCTCTGCCA AGGTGTGTGG CCGAATGTGG AGCAAGTGTC 

55 2551 AAAGGAAATG AAGGAACATT ACTGTCTCCA AATTTTCCAT CCAATTATGA 

2601 TAATAACCAT GAGTGTATCT ATAAAATAGA AACAGAAGCC GGCAAGGGCA 

2 651 TCCACCTTAG AACACGAAGC TTCCAGCTGT TTGAAGGAGA TACTC TAAAG 

2701 GTATATGATG GAAAAGACAG TTCCTCACGT CCACTGGGCA CGTTCACTAA 

2751 AAATGAACTT CTGGGGCTGA TCCTAAACAG CACATCCAAT CACCTGTGGC 

60 2801 TAGAGTTCAA CACCAATGGA TCTGACACCG ACCAAGGTTT TCAACTCACC 

2851 TATACCAGTT TTGATCTGGT AAAATGTGAG GATCCGGGCA TCCCTAACTA 

2901 CGGCTATAGG ATCCGTGATG AAGGCCACTT TACCGACACT GTAGTTCTGT 

2951 ACAGTTGCAA CCCGGGGTAC GCCATGCATG GCAGCAACAC CCTGACCTGT 
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3001 TTGAGTGGAG ACAGGAGAGT GTGGGACAAA CCACTACCTT CGTGCATAGC 

3051 GGAATGTGGT GGTCAGATCC ATGCAGCCAC ATCAGGACGA ATATTGTCCC 

3101 CTGGCTATCC AGCTCCGTAT GACAACAACC TCCACTGCAC CTGGATTATA 

3151 GAGGCAGACC CAGGAAAGAC CATTAGCCTC CATTTCATTG TTTTCGACAC 

3201 GGAGATGGCT CACGACATCC TCAAGGTCTG GGACGGGCCG GTGGACAGTG 

3251 ACATCCTGCT GAAGGAGTGG AGTGGCTCCG CCCTTCCGGA GGACATCCAC 

3301 AGCACCTTCA ACTCACTCAC CCTGCAGTTC GACAGCGACT TCTTCATCAG 

3351 CAAGTCTGGC 'TTCTCCATCC AGTTCTCCAC CTCAATTGCA GCCACCTGTA 

34 01 ACGATCCAGG TATGCCCCAA AATGGCACCC GCTATGGAGA CAGCAGAGAG 

34 51 GCTGGAGACA CCGTCACATT CCAGTGTGAC CCTGGCTATC AGCTCCAAGG 

3501 ACAAGCCAAA ATCACCTGTG TGCAGCTGAA TAACCGGTTC TTTTGGCAAC 

3551 CAGACCCTCC TACATGCATA GCTGCTTGTG GAGGGAATCT GACGGGCCCA 

3601 GCAGGTGTTA TTTTGTCACC CAACTACCCA CAGCCGTATC CTCCTGGGAA 

3651 GGAATGTGAC TGGAGAGTAA AAGTGAACCC GGACTTTGTC ATCGCCTTGA 

37 01 TATTCAAAAG TTTCAACATG GAGCCCAGCT ATGACTTCCT ACACATCTAT 

37 51 GAAGGGGAAG ATTCCAACAG CCCCCTCATT GGGAGTTACC AGGGCTCTCA 

3801 GGCCCCAGAA AGAATAGAGA GTAGCGGAAA CAGCCTGTTT CTGGCATTTC 

3851 GGAGTGATGC CTCCGTGGGC CTTTCAGGGT TCGCCATTGA AT T T AAAG AG 

3901 AAACCACGGG AAGCTTGTTT TGACCCAGGA AATATAATGA ATGGGACAAG 

3951 AGTTGGAACA GACTTCAAGC TTGGCTCCAC CATCACCTAC CAGTGTGACT 

4001 CTGGCTATAA GATTCTTGAC CCCTCATCCA TCACCTGTGT GATTGGGGCT 

4051 GATGGGAAAC CCTCCTGGGA CCAAGTGCTG CCCTCCTGCA ATGCTCCCTG 

4101 TGGAGGCCAG TACACGGGAT CAGAAGGGGT AGTTTTATCA CCAAACTACC 

4151 CCCATAATTA CACAGCTGGT CAAATATGCC TCTATTCCAT CACGGTACCA 

4201 AAGGAATTCG TGGTCTTTGG ACAGTTTGCC TATTTCCAGA CAGCCCTGAA 

4 251 TGATTTGGCA GAATTATTTG ATGGAACCCA TGCACAGGCC AGACTTCTCA 

4 301 GCTCACTCTC GGGGTCTCAC TCAGGGGAAA CATTGCCCTT GGCTACGTCA 

4 351 AATCAAATTC TGCTCCGATT CAG TGCAAAG AGCGGTGCCT CTGCCCGCGG 

4 4 01 CTTCCACTTC GTGTATCAAG CTGTTCCTCG TACCAGTGAC ACCCAATGCA 

4 451 GCTCTGTCCC CGAGCCCAGA TACGGAAGGA GAATTGGTTC TGAGTTTTCT 

4 501 GCCGGCTCCA TCGTCCGATT CGAGTGCAAC CCGGGATACC TGCTTCAGGG 

4 551 TTCCACGGCG CTCCACTGCC AGTCCGTGCC CAACGCCTTG GCACAGTGGA 

4 601 ACGACACGAT CCCCAGCTGT GTGGTACCCT GCAGTGGCAA TTTCACTCAA 

4 651 CGAAGAGGTA CAATCCTGTC CCCCGGCTAC CCTGAGCCAT ACGGAAACAA 

4 701 CTTGAACTGT ATATGGAAGA TCATAGTTAC GGAGGGCTCG GGAATTCAGA 

47 51 TCCAAGTGAT CAGTTTTGCC ACGGAGCAGA ACTGGGACTC CCTTGAGATC 

4 801 CACGATGGTG GGGATGTGAC CGCACCCAGA CTGGGAAGCT TCTCAGGCAC 

4 851 CACAGTACCG GCACTGCTGA ACAGTACTTC CAACCAACTC TACCTGCATT 

4 901 TCCAGTCTGA CATTAGTGTG GCAGCTGCTG GTTTCCACCT G G AAT AC AAA 

4 951 ACTGTAGGTC TTGCTGCATG CCAAGAACCA GCCCTCCCCA GCAACAGCAT 
5001 CAAAATCGGA GATCGGTACA TGGTGAACGA CGTGCTCTCC TTCCAGTGCG 
5051 AGCCCGGGTA CACCCTGCAG GGCCGTTCCC ACATTTCCTG TATGCCAGGG 
5101 ACCGTTCGCC GTTGGAACTA TCCGTCTCCC CTGTGCATTG CAACCTGTGG 
5151 AGGGACGCTG AGCACCTTGG GTGGTGTGAT CCTGAGCCCC GGCTTCCCAG 
5201 GTTCTTACCC CAACAACTTA GACTGCACCT GGAGGATCTC ATTACCCATC 
5251 GGCTATGGTG CACATAT TC A GTTTCTGAAT TTTTCTACCG AAGCTAATCA 
5301 TGACTTCCTT GAAATTCAAA ATGGACCTTA CCACACCAGC CCCATGATTG 
5351 GACAATTTAG CGGCACGGAT CTCCCCGCGG CCCTGCTGAk CACAACGCAT 
54 01 GAAACCCTCA TCCACTTTTA TAGTGACCAT TCGCAAAACC GGCAAGGATT 
5451 TAAACTTGCT TACCAAGCCT AATCTGGAAA CATTGGTCCT GCTTTCCCAT 
5501 GTCTTGACAC CCCATTCCAA GCCAGATGTC AAGGAGAAGA AAGGACTTTC 
5551 AATTAAAAAA AAAACAAAAA CTCGAAACAA CATGTTTTTT ATTGTACGCC 

5 601 ATTAATTTCC TATCACTGAG ATATAAAAAT AAATAATGCC NAAAAAAAAA 
5651 AAAAAAAAAA AAAAAAA 
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SEQ ID NO:5 



5R-3V2 Nucleotide sequence 7323 bp 

1 GCGTCGGATG CGCGGCGGGT CTTGGGACCG GGCNCTCTCT CCGGCTCGCC 

5 51 TTGCCCTCGG G TG AT TAT T T GGCTCCGCTC ATAGCCCTGC CTTCCTCGGA 

101 GGAGCCATCG GTGTCGCGTG CGTGTGGNGT ATCTGCAGAC ATGACTGCGT 

151 GGAGGAGATT . CCAGTCGCTG CTCCTGCTTC TCGGGCTGCT GGTGCTGTGC 

201 GCGAGGCTCC TCACTGCAGC GAAGGGTCAG AACTGTGGAG GCTTAGTCCA 

251 GGGTCCCAAT GGCACTATTG AGAGCCCAGG GTTTCCTCAC GGGTATCCGA 

10 301 ACTATGCCAA CTGCACCTGG ATCATCATCA CGGGCGAGCG CAATAGGATA 

351 CAGTTGTCCT TCCATACCTT TGCTCTTGAA GAAGATTTTG ATATTTTATC 

4 01 AGTTTACGAT GGACAGCCTC AACAAGGGAA TTTAAAAGTG AGATTATCGG 

4 51 GATTTCAGCT GCCCTCCTCT ATAGTGAGTA CAGGATCTAT CCTCACTCTG 

501 TGGTTCACGA CAGACTTCGC TGTGAGTGCC CAAGGTTTCA AAGCAT TATA 

15 551 TGAAGTTTTA CCTAGCCACA CTTGTGGAAA TCCTGGAGAA AT CCTGAAAG 

601 GAGTTCTGCA TGGAACGAGA TTCAACATAG GAGACAANAT CCGGTACAGC 

651 TGCCTCCCTG GCTACATCTT GGAAGGCCAC GCCATCCXGA CCTGCATCGT 

701 CAGCCCAGGA AATGGTGCAT CGTGGGACTT CCCAGCTCCC TTTTGCAGAG 

751 CTGAGGGAGC CTGCGGAGGA ACCTTACGCG GGACCAGCAG CTCCATCTCC 

20 801 AGCCCGCACT TCCCTTCAGA GTACGAGAAC AACGCGGACT GCACCTGGAC 

851 CATTCTGGCT GAGCCCGGGG ACACCATTGC GCTGGTCTTC ACTGACTTTC 

901 AGCTAGAAGA AGGATATGAT TTCTTAGAGA TCAGTGGCAC GGAAGCTCCA 

951 TCCATATGGC TAACTGGCAT GAACCTCCCC TCTCCAGTTA TCAGTAGCAA 

1001 GAATTGGCTA CGACTCCATT TCACCTCTGA CAGCAACCAC CGACGCAAAG 

25 1051 GATTTAACGC TCAGTTCCAA GTGAAAAAGG CGATTGAGTT GAAGTCAAGA 

1101 GGAG TCAAGA TGCTGCCCAG CAAGGATGGA AGCCATAAAA ACTCTGTCTT 

1151 GAGCCAAGGA GGTGTTGCAT TGGTCTCTGA CATGTGTCCA GATCCTGGGA 

1201 TTCCAGAAAA TGGTAGAAGA GCAGGTTCCG ACTTCAGGGT TGGTGCAAAT 

1251 GTACAGTTTT CATGTGAGGA CAATTACGTG CTCCAGGGAT CTAAAAGCAT 

30 1301 CACCTGTCAG AGAGTTACAG AGACGCTCGC TGCTTGGAGT GACCACAGGC 

1351 CCATCTGCCG AGCGAGAACA TGTGGATCCA ATCTGCGTGG GCCCAGCGGC 

14 01 GTCATTACCT CCCCTAATTA TCCGGTTCAG TATGAAGATA ATGCACACTG 

14 51 TGTGTGGGTC ATCACCACCA CCGACCCGGA CAAGGTCATC AAGCTTGCCT 

1501 TNGAAGAGTT TGAGCTGGAG CGAGGCTATG ACACCCTNAC GGTTGGTGAT 

35 1551 GCTGGGAAGG TGGGAGACAC CAGATCGGTC TTGTANGTGC TCACGGGATC 

1601 CAGTGTTCCT GACCTCATTG T G AGCAT GAG CAACCAGATG TGGCTACATC 

1651 TGCAGTCGGA TGATAGCATT GGCTCACCTG GGTTTAAAGC TGTTTACCAA 

1701 GAAATTGAAA AGGGAGGGTG TGGGGATCCT GGAATCCCCG CCTATGGGAA 

1751 GCGGACGGGC AGCAGTTTCC TCCATGGAGA TNCACTNACC TTTGAATGCC 

40 1801 CGGCGGCCTT TGAGCTGGTG GGGGAGAGAG TTATCACCTG TCAGCAGAAC 

1851 AATCAGTGGT CTGGCAACAA GCCCAGCTGT GTATTTTCAT GTTTCTTCAA 

1901 CTTTACGGCA TCATCTGGGA TTATTCTGTC ACCAAATTAT CCAGAGGAAT 

1951 ATGGGAACAA CATGAACTGT GTCTGGTTGA TTATCTCGGA GCCAGGAAGT 

2001 CGAATTCACC TAATCTTTAA TGATTTTGAT GTTGAGCCTC AATTTGACTT 

45 2051 TCTCGCGGTC AAGGATGATG GCATTTCTGA CATAACTGTC CTGGGTACTT 

2101 TTTCTGGCAA TGAAGTGCCT TCCCAGCTGG CCAGCAGTGG GCATATAGTT 

2151 CGCTTGGAAT TTCAGTCTGA CCATTCCACT ACTGGCAGAG GGTTNAACAT 

2201 CACTTACACC ACNTTTGGTC AGAATGAGTG CCATGATCCT GGCATTCCTA 

2251 TAAACGGACG ACGTTTTGGT GACAGGTTTC TACT CGGGAG CTCGGTTTCT 

50 2301 TTCCACTGTG ATGATGGCTT TGTCAAGACC CAGGGATCCG AGTCCATTAC 

2351 CTGCATACTG CAAGACGGGA ACGTGGTCTG GAGCTCCACC GTGCCCCGCT 

2401 GTGAAGCTCC ATGTGGTGGA CATCTGACAG CGTCCAGCGG AGTCATTTTG 

24 51 CCTCCTGGAT GGCCAGGATA TTATAAGGAT TCTTTACATT GTGAATGGAT 

2501 AATTGAAGCA AAACCAGGCC ACTCTATCAA AATAACTTTT GACAGATTTC 

55 2551 AGACAGAGGT CAATTATGAC ACCTTGGAGG TCAGAGATGG GCCAGCCAGT 

2 601 TCGTCCCCAC TGATCGGCGA GTACCACGGC ACCCAGGCAC CCCAGTTCCT 

2651 CATCAGCACC GGGAACTTCA TGTACCTGCT ATTCACCACT GACAACAGCC 

. 2701 GCTCCAGCAT CGGCTTCCTC ATCCACTATG AGAGTGTGAC GCTTGAGTCG 

2751 GATTCCTGCC TGGACCCGGG CATCCCTGTG AACGGCCATC GCCACGGTGG 

60 2801 AGACTTTGGC ATCAGGTCCA CAGTGACTTT CAGCTGTGAC CCGGGGTACA 

2851 CACTAAGTGA CGACGAGCCC CTCGTCTGTG AGAGGAACCA CCAGTGGAAC 

2901 CACGCCTTGC CCAGCTGCGA CGCTCTATGT GGAGGCTACA TCCAAGGGAA 

2 951 GAGTGGAACA GTCCTTTCTC CTGGGTTTCC. AGATTTTTAT CCAAACTCTC 
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3001 TAAACTGCAC GTGGACCATT GAAGTGTCTC ATGGGAAAGG AGTTCAAATG 

3051 ATCTTTCACA CCTTTCATCT TGAGAGTTCC CACGACTATT TACTGATCAC 

3101 AGAGGATGGA AGTTTTTCCG AGCCCGTTGC CAGGCTCACC ■ GGGTCGGTGT 

3151 TGCCTCATAC GATCAAGGCA GGCCTGTTNG GAAACTTCAC TGCCCAGCTT 

3201 CGGTTTATAT CAGACTTCTC AATTTCGTAC GAGGGCTTCA AT AT C ACAT T 

3251 TTCAGAATAT GACCTGGAGC CATGTGATGA- TCCTGGAGTC CCTGCCTTCA 

3301 GCCGAAGAAT TGGTTTTCAC TTTGGTGTGG GAGACTCTCT GACGTTTTCC 

3351 TGCTTCCTGG GATATCGTTT AGAAGGTGCC ACCAAGCTTA CCTGCCTGGG 

34 01 TGGGGGCCGC CGTGTGTGGA GTGCACCTCT GCCAAGGTGT GTGGCCGAAT 

34 51 GTGGAGCAAG TGTCAAAGGA AATGAAGGAA CATTACTGTC TCCAAATTTT 

3501 CCATCCAATT ATGATAATAA CCATGAGTGT ATCTATAAAA TAGAAACAGA 

3551 AGCCGGCAAG GGCATCCACC TTAGAACACG AAGCTTGCAG CTGTTTGAAG 

3601 GAGATACTCT AAAGGTATAT GATGGAAAAG ACAGTTCCTC ACGTCCACTG 

3651 GGCACGTTCA CTAAAAATGA ACTTCTGGGG CTGATCCTAA ACAGGACATC 

3701 CAATCACCTG TGGCTAGAGT TCAACACCAA TGGATCTGAC ACCGACCAAG 

3751 GTTTTCAACT CACCTATACC AGTTTTGATC TGGTAAAATG TGAGGATCCG 

3801 GGCATCCCTA ACTACGGCTA TAGGATCCGT GATGAAGGCC ACTTTACCGA 

3851 CACTGTAGTT CTGTACAGTT GCAACCCGGG GTACGCCATG CATGGCAGCA 

3901 ACACCCTGAC CTGTTTGAGT GGAGACAGGA GAGTGTGGGA C AAACC AC T A 

3951 CCTTCGTGCA TAGCGGAATG TGGTGGTCAG ATCCATGGAG CCACATCAGG 

4 001 ACGAATATTG TCCCCTGGCT ATCCAGCTCC GTATGACAAC AACCTCCACT 

4 051 GCACCTGGAT TATAGAGGCA GACCCAGGAA AGACCATTAG CCTCCATTTC 

4101 ATTGTTTTCG ACACGGAGAT GGCTCACGAC ATCCTCAAGG TCTGGGACGG 

4151 GCCGGTGGAC AGTGACATCC TGCTGAAGGA GTGGAGTGGC TCCGCCCTTC 

4 201 CGGAGGACAT CCACAGCACC TTCAACTCAC TCACCCTGCA GTTCGACAGC 

4251 GACTTCTTCA TCAGCAAGTC TGGCTTCTCC ATCCAGTTCT CCACCTCAAT 

4 301 TGCAGCCACC TGTAACGATC CAGGTATGCC CCAAAATGGC ACCCGCTATG 

4 351 GAGACAGCAG AGAGGCTGGA GACACCGTCA CATTCCAGTG TGACCCTGGC 

4 4 01 TATCAGCTCC AAGGACAAGC CAAAATCACC TGTGTGCAGC TGAATAACCG 

4 451 GTTCTTTTGG CAACCAGACC CTCCTACATG CATAGCTGCT TGTGGAGGGA 

4 501 ATCTGACGGG CCCAGCAGGT GTTATTTTGT CACCCAACTA CCCACAGCCG 

4 551 TATCCTCCTG GGAAGGAATG TGACTGGAGA G TAAAAG TG A ACCCGGACTT 

4 601 TGTCATCGCC TTGATATTCA AAAGTTTCAA CATGGAGCCC AGCTATGACT 

4 651 TCCTACACAT CTATGAAGGG GAAGAT T CCA ACAGCCCCCT CATTGGGAGT 

4 701 TACCAGGGCT CTCAGGCCCC AGAAAGAATA GAGAGTAGCG GAAACAGCCT 

4 751 GTTTCTGGCA TTTCGGAGTG ATGCCTCCGT GGGCCTTTCA GGGTTCGCCA 

4 801 TTGAATTTAA AGAGAAACCA CGGGAAGCTT GTTTTGACCC AGGAAATATA 

4 851 ATGAATGGGA CAAGAGTTGG AACAGACTTC AAGCTTGGCT CCACCATCAC 

4 901 CTACCAGTGT GACTCTGGCT ATAAGATTCT TGACCCCTCA TCCATCACCT 

4 951 GTGTGATTGG GGCTGATGGG AAACCCTCCT GGGACCAAGT GCTGCCCTCC 

5001 TGCAATGCTC CCTGTGGAGG CCAGTACACG GGATCAGAAG GGGTAGTTTT 

5051 ATCACCAAAC TACCCCCATA ATTACACAGC TGGTCAAATA TGCCTCTATT 

5101 CCATCACGGT ACCAAAGGAA TTCGTGGTCT TTGGACAGTT TGCCTATTTC 

5151 CAGACAGCCC TGAATGATTT GGCAGAATTA TTTGATGGAA CCCATGCACA 

5201 GGCCAGACTT CTCAGCTCAC TCTCGGGGTC TCACTCAGGG GAAACAT T GC 

5251 CCTTGGCTAC GTCAAATCAA ATTCTGCTCC GATTCAGTGC AAAGAGCGGT 

5301 GCCTCTGCCC GCGGCTTCCA CTTCGTGTAT CAAGCTGTTC CTCGTACCAG 

5351 TGACACCCAA TGCAGCTCTG TCCCCGAGCC CAGATACGGA AGGAGAATTG 

54 01 GTTCTGAGTT TTCTGCCGGC TCCATCGTCC GATTCGAGTG CAACCCGGGA 

5451 TACCTGCTTC AGGGTTCCAC GGCGCTCCAC TGCCAGTCCG TGCCCAACGC 

5501 CTTGGCACAG TGGAACGACA CGATCCCCAG CTGTGTGGTA CCCTGCAGTG 

5551 GCAATTTCAC TCAACGAAGA GGTACAATCC TGTCCCCCGG CTACCCTGAG 

5601 CCATACGGAA ACAACTTGAA CTGTATATGG AAG AT CAT AG TTACGGAGGG 

5651 CTCGGGAATT CAGATCCAAG TGATCAGTTT TGCCACGGAG CAGAACTGGG 

5701 s ACTCCCTTGA GATCCACGAT GGTGGGGATG TGACCGCACC CAGACTGGGA 

57 51 AGCTTCTCAG GCACCACAGT ACCGGCACTG CTGAACAGTA CTTCCAACCA 

5801 ACTCTACCTG CATTTCCAGT CTGACATTAG TGTGGCAGCT GCTGGTTTCC 

5851 ACCTGGAATA CAAAACTGTA GGTCTTGCTG CATGCCAAGA ACCAGCCCTC 

5901 CCC^GCAACA GCATCAAAAT CGGAGATCGG TACATGGTGA ACGACGTGCT 

5951 CTCCTTCCAG TGCGAGCCCG GGTACACCCT GCAGGGCCGT TCCCACATTT 

6001 CCTGTATGCC AGGGACCGTT CGCCGTTGGA ACTATCCGTC TCCCCTGTGC 

6051 ATTGCAACCT GTGGAGGGAC GCTGAGCACC TTGGGTGGTG TGATCCTGAG 

6101 CCCCGGCTTC CCAGGTTCTT ACCCCAACAA CTTAGACTGC ACCTGGAGGA 



31 



WO 01/90354 



PCT/GB01/02240 



10 



15 



20 



6151 
6201 
6251 
6301 
6351 
6401 
6451 
6501 
6551 
6601 
6651 
6701 
6751 
6801 
6851 
6901 
6951 
7001 
7051 
7101 
7151 
7201 
7251 
7301 



TCTCATTACC 
ACCGAAGCTA 
CAGCCCCATG 
TGAGCACAAC 
AACCGGCAAG 
TCCAGATCCA 
GCGTGGGGCA 
GGCCATCCTG 
CCCTTTTCCA 
ACGGCACCAT 
GACTGCATTT 
CTTCACCCTG 
ACGGTCCCGA 
GCCCTCGAAA 
CAGCGACTTT 
TGATTTTCAC 
CAGTGTTGTC 
GGGCGTGTAT 
TTTTTGTTTA 
TATTCCAAGA 
AAAGAGATTT 
GGTGTTATAG 
CAATAATCCT 
NATCGCTTAT 



CATCGGCTAT 
ATCATGACTT 
ATTGGACAAT 
GCATGAAACC 
GATTTAAACT 
CCCCCATTTC 
ATCAGTATCT 
TCCTCACTTG 
AGATGTGATG 
CTACTCCCCT 
GGCTCATCAC 
TTACAGACGG 
TCAGAACTCA 
CGGCGTATAG 
TCAAATGGAG 
TCCGTTAGTT 
CCACGCCTTG 
GATTCTTTTG 
ATGATGTCTT 
TACATGTATC 
GGTACCTTTT 
CAAATGTCAT 
GGGAATGCAC 
CTCTCACGAA 



GGTGCACATA 
CCTTGAAATT 
TTAGCGGCAC 
CTCATCCACT 
TGCTTACCAA 
AGAATGGGTA 
TTCGAGTGTT 
TCAGCATGGG 
CCCCTTGTGG 
GGCTTTCCTG 
GGTGCCTCCA 
AAGCTGTCAA 
CCCCAGCTGG 
CTCCACCAAC 
GCTTCTTTGT 
AAGACTGAGA 
TTTCCAGCTG 
CACTGGAGGC 
TATTATTTCA 
CTAAGTGAAA 
ATAGATTTAC 
GT AAC TAT AA 
TCTAAACGAT 
CAC 



TTCAGTTTCT 
CAAAATGGAC 
GGATCTCCCC 
TTTATAGTGA 
GCCTATGAAT 
CATGATCAAC 
ATCCTGGGTA 
ATCAACAGAA 
GTACAACGTA 
ATGAGTATCC 
GGGCACGGAG 
CGATTACATT 
GAGTTTTCAG 
CAAGTCCTGC 
CCTCAATTTC 
ATTCCATGTG 
AAGTTTCTTG 
CAGCGTTTCC 
CATCGTATCC 
CTCTAAGATG 
TCATCCCTGT 
ATGGTGTGAA 
ATGTAGAACA 



GAATTTTTCT 
CTTACCACAC 
GCGGCCCTGC 
CCATTCGCAA 
TACAGAACTG 
TCGGATTACA 
CATTCTAATA 
ACTGGAACTA 
ACTTCTCAGA 
GATCCTGAAG 
TTTACATCAA 
GCTGTTTGGG 
TGGCAACACA 
TCAAGTTCCA 
CACGGTCAGT 
GTGTTTACTG 
ATTCAGCCGA 
TGTGGTCCTT 
AGCTTGGATT 
AAGACCATTG 
CTCAAGATAA 
AGCAT^CCTC 
TCTGTCAATC 
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5R23V2 



AGCTTGTGCCCTTTCCACCTGCATTTCTGATCTAAGTTAGGTAGGGGGCTGCTCTCTGGTC 
5 AGCAAGGAAGGGAGATCAAAGGATGGAGGCGGGACTCTGCCCCTGCAGAAACCCTCCAG 
TTTGCTGGAGTTGCCGGATTACATTGTTCCTCCCCGGTGTGCGGCGTGAGCTTCCCCCACC 
CGAGCGCCCAACAAGTCTCCTTTCTCCAGCGTGCGCGCTGCTGCGCTGAGGCCGAATGAA 
GCGCAGCACGGTGCGGGCAGCCCGAGGCCCCGAGGCTGGGCTCTGTCTGTCTGGGACTGC 
GCCGTGCCCAGCCTCGGTCCCCTCTCTGTGGGTAAGGATGGTTGAGTCCAGCCTCCACGG 

1 0 CAGCGGCTCCTTGTGCCAGTAGCAGCCCTTCTTCTGCGCTCTCCGCCTTTTCTCTCTAGAC 
TGGATCTCTCCTCCCCCCGCGCCCCCCTCCCCGCATCTCCCACTCGCTGGCTCTCTCTCCA 
GCTGCCTCCTCTCCAGGTCTCTCCTGGCTGCGCGCGCTCCTCTCCCCGCTTCTCCCCCTCCC 
GCAGCCTCGCCGCCTTGGTGCCTTCCTGCCCGGCTCGGCCGGCGCTCGTCCCCGGCCCCG 
GCCCCGCCAGCCCGGGTCTCCGCGCTCGGAGCAGCTCAGCCCTGCAGTGGCTCGGGACCC 

1 5 GATGCTATGAG AGGGAAGCGAGCCGGGCGCCCAGACCTTCAGGAGGCGTCGGATGCGCG 
GCGGGTCTTGGGACCGGGCTCTCTCTCCGGCTCGCCTTGCCCTCGGGTGATTATTTGGCTC 
CGCTCATAGCCCTGCCTTCCTCGGAGGAGCCATCGGTGTCGCGTGCGTGTGGAGTATCTG 
CAGACATGACTGCGTGGAGGAGATTCCAGTCGCTGCTCCTGCTTCTCGGGCTGCTGGTGC 
TGTGCGCGAGGCTCCTCACTGCAGCGAAGGGTCAGAACTGTGGAGGCTTAGTCCAGGGTC 

20 CCAATGGCACTATTGAGAGCCCAGGGTTTCCTCACGGGTA'TCCGAACTATGCCAACTGCA 
CCTGGATCATCATCACGGGCGAGCGCAATAGGATACAGTTGTCCTTCCATACCTTTGCTCT 
TGAAGAAGATTTTGATATTTTATCAGTTTACGATGGACAGCCTCAACAAGGGAATTTAAA 
AGTGAGATTATCGGGATTTCAGCTGCCCTCCTCTATAGTGAGTACAGGATCTATCCTCACT 
CTGTGGTTCACGACAGACTTCGCTGTGAGTGCCCAAGGTTTCAAAGCATTATATGAAGTT 

25 TTACCTAGCCACACTTGTGGAAATCCTGGAGAAATCCTGAAAGGAGTTCTGCATGGAACG 
AGATTCAACATAGGAGACAANATCCGGTACAGCTGCCTCCCTGGCTACATCTTGGAAGGC 
CACGCCATCCTGACCTGCATCGTCAGCCCAGGAAATGGTGCATCGTGGGACTTCCCAGCT 
CCCTTTTGCAGAGCTGAGGGAGCCTGCGGAGGAACCTTACGCGGGACCAGCAGCTCCATC 
TCCAGCCCGCACTTCCCTTCAGAGTACGAGAACAACGCGGACTGCACCTGGACCATTCTG 

30 GCTGAGCCCGGGGACACCATTGCGCTGGTCTTCACTGACTTTCAGCTAGAAGAAGGATAT 
GATTTCTTAGAGATCAGTGGCACGGAAGCTCCATCCATATGGCTAACTGGCATGAACCTC 
CCCTCTCCAGTTATCAGTAGCAAGAATTGGCTACGACTCCATTTCACCTCTGACAGCAACC 
ACCGACGCAAAGGATTTAACGCTCAGTTCCAAGTGAAAAAGGCGATTGAGTTGAAGTCA 
AGAGGAGTCAAGATGCTGCCCAGCAAGGATGGAAGCCATAAAAACTCTGTCTTGAGCCA 

3 5 AGGAGGTGTTGCATTGGTCTCTGACATGTGTCCAGATCCTGGGATTCCAGAAAATGGTAG 
AAGAGCAGGTTCCGACTTCAGGGTTGGTGCAAATGTACAGTTTTCATGTGAGGACAATTA 
CGTGCTCCAGGGATCTAAAAGCATCACCTGTCAGAGAGTTACAGAGACGCTCGCTGCTTG 
GAGTGACCACAGGCCCATCTGCCGAGCGAGAACATGTGGATCCAATCTGCGTGGGCCCAG 
CGGCGTCATTACCTCCCCTAATTATCCGGTTCAGTATGAAGATAATGCACACTGTGTGTG 

40 GGTCATCACCACCACCGACCCGGACAAGGTCATCAAGCTTGCCTTNGAAGAGTTTGAGCT 
GGAGCGAGGCTATGACACCCTNACGGTTGGTGATGCTGGGAAGGTGGGAGACACCAGAT 
CGGTCTTGTANGTGCTCACGGGATCCAGTGTTCCTGACCTCATTGTGAGCATGAGCAACC 
AGATGTGGCTACATCTGCAGTCGGATGATAGCATTGGCTCACCTGGGTTTAAAGCTGTTT 
ACCAAGAAATTGAAAAGGGAGGGTGTGGGGATCCTGGAATCCCCGCCTATGGGAAGCGG 

45 ACGGGCAGCAGTTTCCTCCATGGAGATNCACTNACCTTTGAATGCCCGGCGGCCTTTGAG 
CTGGTGGGGGAGAGAGTTATCACCTGTCAGCAGAACAATCAGTGGTCTGGCAACAAGCCC 
AGCTGTGTATTTTCATGTTTCTTCAACTTTACGGCATCATCTGGGATTATTCTGTCACCAA 
ATTATCCAGAGGAATATGGGAACAACATGAACTGTGTCTGGTTGATTATCTCGGAGCCAG 
GAAGTCGAATTCACCTAATCTTTAATGATTTTGATGTTGAGCCTCAATTTGACTTTCTCGC 

50 GGTCAAGGATGATGGCATTTCTGACATAACTGTCCTGGGTACTTTTTCTGGCAATGAAGT 
GCCTTCCCAGCTGGCCAGCAGTGGGCATATAGTTCGCTTGGAATTTCAGTCTGACCATTCC 
ACTACTGGCAGAGGGTTNAACATCACTTACACCACNTTTGGTCAGAATGAGTGCCATGAT 
CCTGGCATTCCTATAAACGGACGACGTTTTGGTGACAGGTTTCTACTCGGGAGCTCGGTT 
TCTTTCCACTGTGATGATGGCTTTGTCAAGACCCAGGGATCCGAGTCCATTACCTGCATAC 

55 TGCAAGACGGGAACGTGGTCTGGAGCTCCACCGTGCCCCGCTGTGAAGCTCCATGTGGTG 
GACATCTGACAGCGTCCAGCGGAGTCATTTTGCCTCCTGGATGGCCAGGATATTATAAGG 
ATTCTTTACATTGTGAATGGATAATTGAAGCAAAACCAGGCCACTCTATCAAAATAACTT 



33 



WO 01/90354 PCT/GB01/02240 



TTGACAGATTTCAGACAGAGGTCAATTATGACACCTTGGAGGTCAGAGATGGGCCAGCCA 
GTTCGTCCCCACTGATCGGCGAGTACCACGGCACCCAGGCACCCCAGTTCCTCATCAGCA 
: _ CCGGGAACTTCATGTACCTGCTATTCACCACTGACAACAGCCGCTCCAGCATCGGCTTCCT 

CXTCCACTATGAGAGTGTGACGCTTGAGTCGGATTCCTGCCTGGACCCGGGCATCCCTGT 
5 GAACGGCCATCGCCACGGTGGAGACITTGGCATCAGGTCCACAGTGACTTTCAGCTGTGA 
CCCGGGGTAGACACTAAGTGACGACGAGCCCCTCGTCTGTGAGAGGAACCACCAGTGGA 
ACCACGCCTTGCCCAGCTGCGACGCTCTATGTGGAGGCTACATCCAAGGGAAGAGTGGAA 
CAGTCCTTTCTCCTGGGTTTCCAGATTTTTATCCAAACTCTCTAA^ 

TGAAGTGTCTCATGGGAAAGGAGTTCAAATGATCTTTCACACCTTTCATCTTGAGAGTTCC 

10 CACGACTATTTACTGATCACAGAGGATGGAAGTTTTTCCGAGCCCGTTGCCAGGCTCACC 
GGGTCGGTGTTGCCTCATACGATCAAGGCAGGCCTGTTNGGAAACTTCACTGCCCAGCTT 
CGGTTTATATCAGACTTCTCAATTTCGTACGAGGGCTTCAATATCACATTTTCAGAATATG 
ACCTGGAGCCATGTGATGATCCTGGAGTCCCTGCCTTCAGCCGAAGAATTGGTTTTCACTT 
TGGTGTGGGAGACTCTCTGACGTTTTCCTGCTTCCTGGGATATCGTTTAGAAGGTGCCACC 

15 AAGCTTACCTGCCTGGGTGGC5GGCCGCCGTGTGTGGAGTGCACCTCTGCCAAGGTGTGTG 
GCCGAATGTGGAGCAAGTGTCAAAGGAAATGAAGGAACATTACTGTCTCCAAATTTTCCA 
TCCAATTATGATAATAACCATGAGTGTATCTATAAAATAGAAACAGAAGCCGGCAAGGGC 
ATCCACCTTAGAACACGAAGCTTCCAGCTGTTTGAAGGAGATACTCTAAAGGTATATGAT 
GGAAAAGACAGTTCCTCACGTCCACTGGGCACGTTCACTAAAAATGAACTTCTGGGGCTG 

20 ATCCTAAACAGCACATCCAATCACCTGTGGCTAGAGTTCAACACCAATGGATCTGACACC 
GACCAAGGTTTTCAACTCACCTATACCAGTTTTGATCTGGTAAAATGTGAGGATCCGGGC 
ATCCCTAACTACGGCTATAGGATCCGTGATGAAGGCCACTTTACCGACACTGTAGTTCTG 
TACAGTTGCAACCCGGGGTACGCCATGCATGGCAGCAACACCCTGACCTGTTTGAGTGGA 
GACAGGAGAGTGTGGGACAAACCACTACCTTCGTGCATAGCGGAATGTGGTGGTCAGAT 

25 CCATGCAGCCACATCAGGACGAATATTGTCCCCTGGCTATCCAGCTCCGTATGACAACAA 
CCTCCACTGCACCTGGATTATAGAGGCAGACCCAGGAAAGACCATTAGCCTCCATTTCAT 
TGTTTTCGACACGGAGATGGCTCACGACATCCTCAAGGTCTGGGACGGGCCGGTGGACAG 
TGACATCCTGCTGAAGGAGTGGAGTGGCTCCGCCCTTCCGGAGGACATCCACAGCACCTT 
CAACTCACTCACCCTGCAGTTCGACAGCGACTTCTTCATCAGCAAGTCTGGCTTCTCCATC 

30 CAGTTCTCCACCTCAATTGCAGCCACCTGTAACGATCCAGGTATGCCCCAAAATGGCACC 
CGCTATGGAGACAGCAGAGAGGCTGGAGACACCGTCACATTCCAGTGTGACCCTGGCTAT 
CAGCTCCAAGGACAAGCCAAAATCACCTGTGTGCAGCTGAATAACCGGTTCTTTTGGCAA 
CCAGACCCTCCTACATGCATAGCTGCTTGTGGAGGGAATCTGACGGGCCCAGCAGGTGTT 
ATTTTGTCACCCAACTACCCACAGCCGTATCCTCCTGGGAAGGAATGTGACTGGAGAGTA 

35 AAAGTGAACCCGGACTTTGTCATCGCCTTGATATTCAAAAGTTTCAACATGGAGCCCAGC 
TATGACTTCCTACACATCTATGAAGGGGAAGATTCCAACAGCCCCCTCATTGGGAGTTAC 
CAGGGCTCTCAGGCCCCAGAAAGAATAGAGAGTAGCGGAAACAGCCTGTTTCTGGCATTT 
CGGAGTGATGCCTCCGTGGGCCTTTCAGGGTTCGCCATTGAATTTAAAOAGAAACCACGG 
GAAGCTTGTTTTGACCCAGGAAATATAATGAATGGGACAAGAGTTGGAACAGACTTCAAG 

40 CTTGGCTCCACCATCACCTACCAGTGTGACTCTGGCTATAAGATTCTTGACCCCTCATCCA 
TCACCTGTGTGATTGGGGCTGATGGGAAACCCTCCTGGGACCAAGTGCTGCCCTCCTGCA 
ATGCTCCCTGTGGAGGCCAGTACACGGGATCAGAAGGGGTAGTTTTATCACCAAACTACC 
CCCATAATTACACAGCTGGTCAAATATGCCTCTATTCCATCACGGTACCAAAGGAATTCG 
TGGTCTTTGGACAGTTTGCCTATTTCCAG 

45 TGGAACCCATGCACAGGCCAGACTTCTCAGCTCACTCTCGGGGTCTCACTCAGGGGAAAC 
ATTGCCCTTGGCTACGTCAAATCAAATTCTGCTCCGATTCAGTGCAAAGAGCGGTGCCTCT 
GCCCGCGGCTTCCACTTCGTGTATCAAGCTGTTCCTCGTACCAGTGACACCCAATGCAGCT 
CTGTCCCCGAGCCCAGATACGGAAGGAGAATTGGTTCTGAGTTTTCTGCCGGCTCCATCG 
TCCGATTCGAGTGCAACCCGGGATACCTGCTTCAGGGTTCCACGGCGCTCCACTGCCAGT 

50 CCGTGCCCAACGCCTTGGCACAGTGGAACGACACGATCCCCAGCTGTGTGGTACCCTGCA 
GTGGCAATTTCACTCAACGAAGAGGTACAATCCTGTCCCCCGGCTACCCTGAGCCATACG 
GAAACAACTTGAACTGTATATGGAAGATCATAGTTACGGAGGGCTCGGGAATTCAGATCC 
AAGTGATCAGTTTTGCCACGGAGCAGAACTGGGACTCCCTTGAGATCCACGATGGTGGGG 
ATGTGACCGCACCCAGACTGGGAAGCTTCTCAGGCACCACAGTACCGGCACTGCTGAACA 

55 GTACTTCCAACCAACTCTACCTGCATTTCCAGTCTGACATTAGTGTGGCAGCTGCTGGTTT 
, CCACCTGGAATACAAAACTGTAGGTCTTGCTGCATGCCAAGAACCAGCCCTCCCCAGCAA 
CAGCATCAAAATCGGAGATCGGTACATGGTGAACGACGTGCTCTCCTTCCAGTGCGAGCC 
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CGGGTACACdCTGCAGGGCCGTTCCCACATTTCCTGTATGCCAGGGACCGTTCGCCGTTG 

GAACTATCCGTCTCCCCTGTGCATTGCAACCTGTGGAGGGACGCTGAGCACCTTGGGTGG 

TGTGATCCTGAGCCCCGGCTTCCCAGGTTCTTACCCCAACAACTTAGACTGCACCTGGAG 

GATCTCATTACCCATCGGCTATGGTGCACATATTCAGTITCTGAATTTTTCTACCGAAGCT 

AATCATGACTTCCTTGAAATTCAAAATGGACCTTACCACACCAGGCCCATGATTGGACAA 

TTTAGCGGCACGGATCTCCCCGCGGCCCTGCTGAGCACAACGCATG AAACCCTCATCCAC . 

TTTTATAGTGACCATTCGCAAAACCGGCAAGGATTTAAACTTGCTTACCAAGCCTATGAA 

TTACAGAACTGTCCAGATCCACCCCCATTTCAGAATGGGTACATGATCAACTCGGATTAC 

AGCGTGGGGCAATCAGTATCTTTCGAGTGTTATCCTGGGTACATTCTAATAGGCCATCCT 

GTCCTCACTTGTCAGCATGGGATCAACAGAAACTGGAACTACCCTTTTCCAAGATGTGAT 

GCCCCTTGTGGGTACAACGTAACTTCTCAGAACGGCACCATCTACTCCCCTGGCTTTCCTG 

ATGAGTATCCGATCCTGAAGGACTGCATTTGGCTCATCACGGTGCCTCCAGGGCACGGAG 

TTTACATCAACTTCACCCTGTTACAGACGGAAGCTGTCAACGATTACATTGCTGTTTGGGA 

CGGTCCCGATCAGAACTCACCCCAGCTGGGAGTTTTCAGTGGCAACACAGCCCTCGAAAC 

GGCGTATAGCTCCACCAACCAAGTCCTGCTCAAGTTCCACAGCGACTTTTCAAATGGAGG 

CTTCTTTGTCCTCAATTTCCACGGTCAGTTGATTTTCACTCCGTTAGTTAAGACTGAGAAT 

TCCATGTGGTGTTTACTGCAGTGTTGTCCCACGCCTTGTTTCCAGCTGAAGTTTCTTGATT 

CAGCCGAGGGCGTGTATGATTCTTTTGCACTGGAGGCCAGCGTTTCCTGTGGTCCTTTTTT 

TGTTTAATGATGTCTTTATTATTTCACATCGTATCCAGCTTGGATTTATTCCAAGATACAT 

GTATCCTAAGTGAAACTCTAAGATGAAGACCATTGAAAGAGATTTGGTACCTTTTATAGA 

TTTACTCATCCCTGTCTCAAGATAAGGTGTTATAGCAAATGTCATGTAACTATAAATGGTG 

TGAAAGCAAACCTCCAATAATCCTGGGAATGCACTCTAAACGATATGTAGAACATCTGTC 

AATCNATCGCTTATCTCTCACGAACACN 
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SEQ ID NO:7 
5R2_OC147 

AGCTTGTGCCCTTTCCACCTGCATTTCTGATCTAAGTTAGGTAGGGGGCTGCTCTCTGGTCAGCAAGG 
5 AAGGGAGATCAAAGGATGGAGGCGGGACTCTGCCCCTGCAGAAACCCTCCAGTTTGCTGGAGTTGCCG 
GATTACATTGTTCCTCCCCGGTGTGCGGCGTGAGCTTCCCCCACCCGAGCGCCCAAC7VAGTCTCCTTT 
CTCCAGCCTGCGCGCTGCTGCGCTGAGGCCGAATGAAGCGCAGCACGGTGCGGGCAGCCCGAGGCCCC 
GAGGCTGGGCTCTGTCTGTCTGGGACTGCGCCGTGCCCAGCCTCGGTCCCCTCTCTGTGGGTAAGGAT 
GGTTGAGTCCAGCCTCCACGGCAGCGGCTCCTTGTGCCACTAGCAGCCCTTCTTCTGCGCTCTCCGCC 

10 TTTTCT.CTCTAGACTGGATCTCTCCTCCCCCCGCGCCCCCCTCCCCGCATCTCCCACTCGCTGGCTCT 
CTCTCCAGCTGCCTCCTCTCCAGGTCTCTCCTGGCTGCGCGCGCTCCTCTCCCCGCTTCTCCCCCTCC 
CGCAGCCTCGCCGCCTTGGTGCCTTCCTGCCCGGCTCGGCCGGCGCTCGTCCCCGGCCCCGGCCCCGC 
CAGCCCGGGTCTCCGCGCTCGGAGCAGCTCAGCCCTGCAGTGGGTCGGGACCCGATGCTATGAGAGGG 
AAGCGAGCCGGGCGCCCAGACCTTCAGGAGGCGTCGGATGCGCGGCGGGTCTTGGGACCGGGCTCTCT 

15 CTCCGGCTCGCCTTGCCCTCGGGTGATTATTTGGCTCCGCTCATAGCCCTGCCTTCCTCGGAGGAGCC 
ATCGGTGTCGCGTGCGTGTGGAGTATCTGCAGACATGACTGCGTGGAGGAGATTCCAGTCGCTGCTCC 
TGCTTCTCGGGCTGCTGGTGCTGTGCGCGAGGCTCCTCACTGCAGCGAAGGGTCAGAACTGTGGAGGC 
TTAGTCCAGGGTCCCAATGGCACTATTGAGAGCCCAGGGTTTCCTCACGGGTATCCGAACTATGCCAA 
CTGCACCTGGATCATCATCACGGGCGAGCGCAATAGGATACAGTTGTCCTTCCATACCTTTGCTCTTG 

20 AAGAAGATTTTGATATTTTATCAGTTTACGATGGACAGCCTCAACAAGGGAATTTAAAAGTGAGATTA 
TCGGGATTTCAGCTGCCCTCCTCTATAGTGAGTACAGGATCTATCCTCACTCTGTGGTTCACGACAGA 
CTTCGCTGTGAGTGCCCAAGGTTTCAAAGCATTATATGAAGTTTTACCTAGCCACACTTGTGGAAATC 
CTGGAGAAATCCTGAAAGGAGTTCTGCATGGAACGAGATTCAACATAGGAGACAAAATCCGGTACAGC 
TGCCTCCCTGGCTACATCTTGGAAGGCCACGCCATCCTGACCTGCATCGTCAGCCCAGGAAATGGTGC 

25 ATCGTGGGACTTCCCAGCTCCCTTTTGCAGAGCTGAGGGAGCCTGCGGAGGAACCTTACGCGGGACCA 
GCAGCTCCATCTCCAGCCCGCACTTCCCTTCAGAGTACGAGAACAACGCGGACTGCACCTGGACCATT 
CTGGCTGAGCCCGGGGACACCATTGCGCTGGTCTTCACTGACTTTCAGCTAGAAGT^AGGATATGATTT 
CTTAGAGATCAGTGGCACGGAAGCTCCATCCATATGGCTAACTGGCATGAACCTCCCCTCTCCAGTTA 
TCAGTAGCAAGAATTGGCTACGACTCCATTTCACCTCTGACAGCAACCACCGACGCAAAGGATTTAAC 

30 GCTCAGTTCCAAGTGAAAAAGGCGATTGAGTTGAAGTCAAGAGGAGTCAAGATGCTGCCCAGCAAGGA 
TGGAAGCCATT^AAAACTCTGTCTGTGAGTCCCTTTCCTTTCTATCTGAGGATTGATACGCCCTTGTAA 
GCAGAGGAGAGAATGGAGCAGTG 
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SEQH>NO:8 
5R2_AW 

AGCTTGTGCCCTTTCCACCTGCATTTCTGATCTAAGTTAGGTAGGGGGCTGCTCTCTGGTCAGCAAGG 
5 AAGGGAGATCAAAGGATGGAGGCGGGACTCTGCCCCTGCAGAAACCCTCCAGTTTGCTGGAGTTGCCG 
GATTACATT3GTTCCTCCCCGGTGTGCGGCGTGAGCTTCCCCCACCCGAGCGCCCAACAAGTCTCCTTT 
CTCCAGCCTGCGCGCTGCTGGGCTGAGGCCGAATGAAGCGCAGCACGGTGCGGGCAGCCCGAGGCCCC 
GAGGCTGGGCTCTGTCTGTCTGGGACTGCGCCGTGCCCAGCCTCGGTCCCCTCTCTGTGGGTAAGGAT 
GGTTGAGTCCAGCCTCCACGGCAGCGGCTCCTTGTGCCACTAGCAGCCCTTCTTCTGCGCTCTCCGCC 

10 TTTTCTCTCTAGACTGGATCTCTCCTCCCCCCGCGCCCCCCTCCCCGCATCTCCCACTCGCTGGCTCT 
CTCTCCAGCTGCCTCCTCTCeAGGTCTCTCCTGGCTGCGCGCGCTCCTCTCCCCGCTTCTCCCCCTCC 
CGCAGCCTCGCCGCCTTGGTGCCTTCCTGCCCGGCTCGGCCGGCGCTCGTCCCCGGCCCCGGCCCCGC 
CAGCCCGGGTCTCCGCGCTCGGAGCAGCTCAGCCCTGCAGTGGCTCGGGACCCGATGCTATGAGAGGG 
AAGCGAGCCGGGCGCCCAGACCTTCAGGAGGCGTCGGATGCGCGGCGGGTCTTGGGACCGGGCTCTCT 

15 CTCCGGCTCGCCTTGCCCTCGGGTGATTATTTGGCTCCGCTCATAGCCCTGCCTTCCTCGGAGGAGCC 
ATCGGTGTCGCGTGCGTGTGGAGTATCTGCAGACATGACTGCGTGGAGGAGATTCCAGTCGCTGCTCC 
TGCTTCTCGGGCTGCTGGTGCTGTGCGCGAGGCTCCTCACTGCAGCGAAGGGTCAGAACTGTGGAGGC 
TTAGTCCAGGGTCCCAATGGCACTATTGAGAGCCCAGGGTTTCCTCACGGGTATCCGAACTATGCCAA 
CTGCACCTGGATCATCATCACGGGCGAGCGCAATAGGATACAGTTGTCCTTCCATACCTTTGCTCTTG 

20 AAGAAGATTTTGATATTTTATCAGTTTACGATGGACAGCCTCAACAAGGGAATTTAAAAGTGAGATTA 
TCGGGATTTCAGCTGCCCTCCTCTATAGTGAGTACAGGATCTATCCTCACTCTGTGGTTCACGACAGA 
CTTCGCTGTGAGTGCCCAAGGTTTCAAAGCATTATATGAAGTTTTACCTAGCCACACTTGTGGAAATC 
CTGGAGAAATCCTGAAAGGAGTTCTGCATGGAACGAGATTCAACATAGGAGACAAAATCCGGTACAGC 
TGCCTCCCTGGCTACATCTTGGAAGGCCACGCCATCCTGACCTGCATCGTCAGCCCAGGAAATGGTGC 

25 ATCGTGGGACTTCCCAGCTCCCTTTTGCAGAGCTGAGGGAGCCTGCGGAGGAACCTTACGCGGGACCA 
GCAGCTCCATCTCCAGCCCGCACTTCCCTTCAGAGTACGAGAACAACGCGGACTGCACCTGGACCATT 
CTGGCTGAGCCCGGGGACACCATTGCGCTGGTCTTCACTGACTTTCAGCTAGAAGAAGGATATGATTT _ 
CTTAGAGATCAGTGGCACGGAAGCTCCATCCATATGGCTAACTGGCATGAACCTCCCCTCTCCAGTTA 
TCAGTAGCAAGAATTGGCTACGACTCCATTTCACCTCTGAGAGCAACCACCGACGCAAAGGATTTAAC 

30 GCTCAGTTCCAAGTGAAAAAGGCGATTGAGTTGAAGTCAAGAGGAGTCAAGATGCTGCCCAGCAAGGA 
TGGAAGCCATAAAAACTCTGTCTGGCATCAGCAAGAGTTCAGCAAGTGCAGGAAGAAAAAGAGAGAGA 
TCATGACAAGGAATGGGAGAATTTCCCTGACAGCCTCAGGAAACTTGCAGTTTGATAATTAAACAGAT 
CAAGGTCACTCAGATGAGCTGATGGGACATGCTGTGTACGGAGGAGCATTTGCAGTTACAACACTTTG 
TAGCCATGCAGGATGGGGCAATTAATCCAGAACCATTATTTAATAAAAAGATGATTTTTTAAATGTGA 

35 AA 
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SEQ ID NO:9 

protein sequence 

>ORF: 121. .5598 Frame +1 

5. 

MEAIKTLSGIVJNNINHVTSEEDTFIMYLGKPWLQVKIQVSQGGVALVSDMCPDPGIPENGRRAGSDFR 
VGANVQFSCEDNYVLQGSKSITCQRVTETLAAWSDHRPICRARTCGSNLRGPSGVITSPNYPVQYEDN 
AHCVWVITTTDPDKVIKIaAFEEFELERGYDTLTVGDAGKVGDTRSVLYVLTGSSVPDLIVSMSNQMWL 

ICL HLQSDDSIGSPGFKAVYQEIEKGGCGDPGIPAYGKRTGSSFLHGDTLTFECPAAFELVGERVITCQQN 
NQWSGNKPSCVFSCFFNFTASSGI ILSPNYPEEYGNNMNCVWLI I SEPGSRIHLI FNDFDVEPQFDFL 
AVKDDG I S DI T VLGT FS GNEVP SQIAS S GH I VRLEFQSDHSTTGRG FN I TYTTFGQNECHDPGI PING 
w\r vjunr iib^^csvor n^uu\jFv rviQvsoCiD x i cIiiyuvaH v v Woo T v r rv-bArcubniii Aooia v xur r vaw 
PGYYKDSLHCEWIIEAKPGHSIKITFDRFQTEVNYDTLEVRDGPASSSPLIGEYHGTQAPQFLISTGN 

15 FMYLLFTTDNSRSSIGFLIHYESVTLESDSCLDPGIPVNGHRHGGDFGIRSTVTFSCDPGYTLSDDEP 
LVCERNHQWNHALPSCDALCGGYIQGKSGTVLSPGFPDFYPNSLNCTWTIEVSHGKGVQMIFHTFHLE 
SSHDYLLITEDGSFSEPVARLTGSVLPHTIKAGLFGNFTAQLRFISDFSISYEGFNITFSEYDLEPCD 
DPGVPAFSRRIGFHFGVGDSLTFSCFLGYRLEGATKLTCLGGGRRVWSAPLPRCVAECGASVKGNEGT 
LLSPNFPSNYDNNHECIYKIETEAGKGIHLRTRSFQLFEGDTLKVYDGKDSSSRPLGTFTKNELLGLI 

20 LNSTSNHLWLEFNTNGSDTDQGFQLTYTSFDLVKCEDPGIPNYGYRIRDEGHFTDTWLYSCNPGYAM 
HGSNTLTCLSGDRRVWDKPLPSCIAECGGQIHAATSGRILSPGYPAPYDNNLHCTWIIEADPGKTISL 
HFIVFDTEMAHDILKVWDGPVDSDILLKEWSGSALPEDIHSTFNSLTLQFDSDFFISKSGFSIQFSTS 
IAATCNDPGMPQNGTRYGDSREAGDTVTFQCDPGYQLQGQAKITCVQLNNRFFWQPDPPTCIAACGGK 
LTGPAGVILSPNYPQPYPPGKECDWRVKVNPDFVIALIFKSFNMEPSYDFLHIYEGEDSNSPLIGSYQ 

25 GSQAPERIESSGNSLFLAFRSDASVGLSGFAIEFKEKPREACFDPGNIMNGTRVGTDFKLGSTITYQC 
DSGYKILDPSSITCVIGADGKPSWDQVLPSCNAPCGGQYTGSEGWLSPNYPHNYTAGQICLYSITVP 
POEFVVFGQFAYFQTALNDLAELFDGTHAQARLLSSLSGSHSGETLPLATSNQILLRFSAKSGASARGF 
HFVYQAVPRTSDTQCSSVPEPRYGRRIGSEFSAGSIVRFECNPGYLLQGSTALHCQSVPNALAQWNDT 
IPSCWPCSGNFTQRRGTILSPGYPEPYGNNLNCIWKIIVTEGSGIQIQVISFATEQNWDSLEIHDGG 

30 DVTAPRLGSFSGTTVPALLNSTSNQLYLHFQSDISVAAAGFHLEYKTVGLAACQEPALPSNSIKIGDR 
YMVNDVLSFQCEPGYTLQGRSHISCMPGTVRRWNYPSPLCIATCGGTLSTLGGVILSPGFPGSYPNNL . 
DCTWRISLPIGYGAHIQFLNFSTEANHDFLEIQNGPYHTSPMIGQFSGTDLPAALLSTTHETLIHFYS 
DHSQNRQGFKLAYQAYELQNCPDPPPFQNGYMINSDYSVGQSVSFECYPGYILIGHPP 
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SEQ ID NO:10 



5G-3V1 Protein sequence 



10 



15 



20 



25 



30 



35 



40 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 



1801 AA 

MEAIKTLSGI WNNINHVTSE EDTFIMYLGK PWLQVKIQVS 
CPDPG1PENG RRAGSDFRVG ANVQFSCEDN YVLQGSKSIT 
WSDHRPICRA RTCGSNLRGP SGVITSPNYP VQYEDNAHCV 
VIKLAFEEFE.LERGYDTLTV GDAGKVGDTR SVLYVLTGSS 
QMWLHLQSDD SIGSPGFKAV YQEIEKGGCG DPGIPAYGKR 
LTFECPAAFE LVGERVITCQ QNNQWSGNKP SCVFSCFFNF 
NYPEEYGNNM NCVWLIISEP GSRIHLIFND FDVEPQFDFL 
TVLGTFSGNE VPSQLASSGH IVRLEFQSDH STTGRGFNIT 
DPGIPINGRR FGDRFLLGSS VSFHCDDGFV KTQGSESITC 
STVPRCEAPC GGHLTASSGV ILPPGWPGYY KDSLHCEWII 
TFDRFQTEVN YDTLEVRDGP ASSSPLIGEY HGTQAPQFLI 
TTDNSRSSIG FLIHYESVTL ESDSCLDPGI PVNGHRHGGD 
CDPGYTLSDD EPLVCERNHQ WNHALPSCDA LCGGYIQGKS 
FYPNSLNCTW TIEVSHGKGV QMIFHTFHLE SSHDYLLITE 
LTGSVLPHTI KAGLFGNFTA QLRFISDFSI SYEGFNITFS 
GVPAFSRRIG FHFGVGDSLT FSCFLGYRLE GATKLTCLGG 
RCVAECGASV KGNEGTLLST NFPSNYDNNH ECIYKIETEA 
FQLFEGDTLK VYDGKDSSSR PLGTFTKNEL LGLILNSTSN 
SDTDQGFQLT YTSFDLVKCE DPGIPNYGYR IRDEGHFTDT 
AMHGSNTLTC LSGDRRVWDK PLPSCIAECG GQIHAATSGR 
DNNLHCTWII EADPGKTISL HFIVFDTEMA HDILKVWDGP 
SGSALPEDIH STFNSLTLQF DSDFFISKSG FSIQFSTSIA 
NGTRYGDSRE AGDTVTFQCD PGYQLQGQAK ITCVQLNNRF 
AACGGNLTGP AGVILSPNYP QPYPPGKECD WRVKVNPDFV 
EPSYDFLHIY EGEDSNSPLI GSYQGSQAPE RIESSGNSLF 
LSGFAIEFKE KPREACFDPG NIMNGTRVGT DFKLGSTITY 
PSSITCVIGA DGKPSWDQVL PSCNAPCGGQ YTGSEGWLS 
QICLYSITVP KEFWFGQFA YFQTALNDLA ELFDGTRAQA 
SGETLPLATS NQILLRFSAK SGASARGFHF VYQAVPRTSD 
YGRRIGSEFS AGSIVRFECN PGYLLQGSTA LHCQSVPNAL 
WPCSGNFTQ RRGTILSPGY PEPYGNNLNC IWKIIVTEGS 
TEQNWDSLEI HDGGDVTAPR LGSFSGTTVP ALLNSTSNQL 
AAAGFHLEYK TVGLAACQEP ALPSNSIKIG DRYMVNDVLS 
GRSHISCMPG TVRRWNYPSP LCIATCGGTL STLGGVILSP 
DCTWRISLPI GYGAHIQFLN FSTEANHDFL EIQNGPYHTS 
LPAALLSTTH ETLIHFYSDH SQNRQGFKLA YQGMEQQREP 
+ 



QGGVALVSDM 
CQRVTETLAA 
WVITTTDPDK 
VPDLIVSMSN 
TGSSFLHGDT 
TASSGI1LSP 
AVKDDGISDI 
YTTFGQNECH 
ILQDGNWWS 
EAKPGHSIKI 
STGNFMYLLF 
FGIRSTVTFS 
GTVLSPGFPD 
DGSFSEPVAR 
EYDLEPCDDP 
GRRVWSAPLP 
GKGIHLRTRS 
HLWLEFNTNG 
WLYSCNPGY 
ILSPGYPAPY 
VDSDILLKEW 
ATCNDPGMPQ 
FWQPDPPTCI 
IALIFKSFNM 
LAFRSDASVG 
QCDSGYKILD 
PNYPHNYTAG 
RLLSSLSGSH 
TQCSSVPEPR 
AQWNDTIPSC 
GIQIQVISFA 
YLHFQSDISV 
FQCEPGYTLQ 
GFPGSYPNNL 
PMIGQFSGTD 
KPKSKYTSYM 



39 



WO 01/90354 



PCT/GB01/02240 



SEQ ID NO: 11 



10 



15 



20 



25 



30 



35 



40 



5G-3V2 Protein sequence 

1 MEAIKTLSGI WNNINHVTSE 
CPDPGIPENG RRAGSDFRVG 
WSDHRPICRA RTCGSNLRGP 
VIKLAFEEFE . LERGYDTLTV 
QMWLHLQSDD SIGSPGFKAV 
LTFECPAAFE LVGERVITCQ 
NYPEEYGNNM NCVWLIISEP 
TVLGTFSGNE VPSQLASSGH 
DPGIPINGRR FGDRFLLGSS 
STVPRCEAPC GGHLTASSGV 
TFDRFQtEVN YDTLEVRDGP 
TTDNSRSSIG FLIHYESVTL 
CDPGYTLSDD EPLVCERNHQ 
FYPNSLNCTW TIEVSHGKGV 
LTGSVLPHTI KAGLFGNFTA 
GVPAFSRRIG FHFGVGDSLT 
RCVAECGASV KGNEGTLLSP 
FQLFEGDTLK VYDGKDSSSR 
SDTDQGFQLT YTSFDLVKCE 
AMHGSNTLTC LSGDRRVWDK 
DNNLHCTWII EADPGKTISL 
SGSALPEDIH STFNSLTLQF 
NGTRYGDSRE AGDTVTFQCD 
AACGGNLTGP AGVILSPNYP 
EPSYDFLHIY EGEDSNSPLI 
LSGFAIEFKE KPREACFDPG 
PSSITCVIGA DGKPSWDQVL 
QICLYSITVP KEFVVFGQFA 
SGETLPLATS NQILLRFSAK 
YGRRIGSEFS AGSIVRFECN 
WPCSGNFTQ RRGTILSPGY 
TEQNVJDSLEI HDGGDVTAPR 
AAAGFHLEYK TVGLAACQEP 
GRSHISCMPG TVRRWNYPSP 
DCTWRISLPI GYGAHIQFLN 



• 51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 



LPAALLSTTH ETLIHFYSDH 
INSDYSVGQS VSFECYPGYI 
NVTSQNGTIY SPGFPDEYPI 
YIAVWDGPDQ NSPQLGVFSG 
NFHGQLIFTP' LVKTENSMWC 
VSCGPFFV* 



2009 AA 

EDTFIMYLGK 

ANVQFSCEDN 

SGVITSPNYP 

GDAGKVGDTR 

YQEIEKGGCG 

QNNQWSGNKP 

GSRIHLIFND 

IVRLEFQSDH 

VSFHCDDGFV 

ILPPGWPGYY 

ASSSPLIGEY 

ESDSCLDPGI 

WNHALPSCDA 

QMIFHTFHLE 

QLRFISDFSI 

FSCFLGYRLE 

NFPSNYDNNH 

PLGTFTKNEL 

DPGIPNYGYR 

PLPSCIAECG. 

HFIVFDTEMA 

DSDFFISKSG 

PGYQLQGQAK 

QPYPPGKECD 

GSYQGSQAPE 

NIMNGTRVGT 

PSCNAPCGGQ 

YFQTALNDLA 

SGASARG FHF 

PGYLLQGSTA 

PEPYGNNLNC 

LGSFSGTTVP 

ALPSNSIKIG 

LCIATCGGTL 

FSTEANHDFL 

SQNRQGFKLA 

LIGHPVLTCQ 

LKDCIWLITV- 

NTALETAYSS 

LLQCCPTPCF 



PWLQVKIQVS 
YVLQGSKSIT 
VQYEDNAHCV 
SVLYVLTGSS 
DPGIPAYGKR 
SCVFSCFFNF 
FDVEPQFDFL 
STTGRGFNIT 
KTQGSESITC 
KDSLHCEWII 
HGTQAPQFLI 
PVNGHRHGGD 
LCGGYIQGKS 
SSHDYLLITE 
SYEGFNITFS 
GATKLTCLGG 
ECIYKIETEA 
LGLILNSTSN 
IRDEGHFTDT 
GQIHAATSGR 
HDILKVWDGP 
FSIQFSTSIA 
ITCVQLNNRF 
WRVKVNPDFV 
RIESSGNSLF 
DFKLGSTITY 
YTGSEGVVLS 
ELFDGTHAQA 
VYQAVPRTSD 
LHCQSVPNAL 
1WKI1VTEGS 
ALLNSTSNQL 
DRYMVNDVLS 
STLGGVILSP 
EIQNGPYHTS 
YQAYELQNCP 
HGINRNWNYP 
PPGHGVYINF 
TNQVLLKFHS 
QLKFLDSAEG 



QGGVALVS DM 
CQRVTETLAA 
WVITTTDPDK 
VPDLIVSMSN 
TGSSFLHGDT 
TASSGIILSP 
AVKDDGISDI 
YTTFGQNECH 
ILQDGNVVWS 
EAKPGHSIKI 
STGNFMYLLF 
FGIRSTVTFS 
GTVLSPGFPD 
DGSFSEPVAR 
EYDLEPCDDP 
GRRVWSAPLP 
GKGIHLRTRS 
HLWLEFNTNG 
WLYSCNPGY 
ILSPGYPAPY 
VDSDILLKEW 
ATCNDPGMPQ 
FWQPDPPTCI 
IALIFKSFNM 
LAFRSDASVG 
QCDSGYKILD 
PNYPHNYTAG 
RLLSSLSGSH 
TQCSSVPEPR 
AQWNDTIPSC 
GIQIQVISFA 
YLHFQSDISV 
FQCEPGYTLQ 
GFPGSYPNNL 
PMIGQFSGTD 
DPPPFQNGYM 
FPRCDAPCGY 
TLLQTEAVND 
DFSNGGFFVL 
VYDSFALEAS 
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SEQIDNO:12 

5G-3V3 Protein sequence 1784 AA 

1 MEAIKTLSGI WNNINHVTSE EDTFIMYLGK PWLQVKIQVS QGGVALVSDM 

51 CPDPGIPENG RRAGSDFRVG ANVQFSCEDN YVLQGSKSIT CQRVTETLAA 

101 WSDHRPICRA RTCGSNLRGP SGVITSPNYP VQYEDNAHCV WVITTTDPDK 

151 VIKLAFEEFE . LERGYDTLTV GDAGKVGDTR SVLYVLTGSS VPDLIVSMSN 

201 QMWLHLQSDD SIGSPGFKAV YQEIEKGGCG DPGIPAYGKR TGSSFLHGDT 

251 LTFECPAAFE LVGERVITCQ QNNQWSGNKP SCVFSCFFNF TASSGIILSP 

301 NYPEEYGNNM NCVWLIISEP GSRIHLIFND FDVEPQFDFL AVKDDGISDI 

351 TVLGTFSGNE VPSQLASSGH IVRLEFQSDH STTGRGFNIT YTTFGQNECH 

4 01 DPGIPINGRR FGDRFLLGSS VSFHCDDGFV KTQGSESITC ILQDGNWWS 

4 51 STVPRCEAPC GGHLTASSGV ILPPGWPGYY KDSLHCEWII EAKPGHSIKI 

501 TFDRFQTEVN YDTLEVRDGP ASSSPLIGEY HGTQAPQFLI STGNFMYLLF 

551 TTDNSRSSIG FLIHYESVTL ESDSCLDPGI PVNGHRHGGD FGIRSTVTFS 

601 CDPGYTLSDD EPLVCERNHQ WNHALPSCDA LCGGYIQGKS GTVLSPGFPD 

651 FYPNSLNCTW TIEVSHGKGV QMIFHTFHLE SSHDYLLITE DGSFSEPVAR 

701 LTGSVLPHTI KAGLFGNFTA QLRFISDFSI SYEGFNITFS EYDLEPCDDP 

7 51 GVPAFSRRIG FHFGVGDSLT FSCFLGYRLE GATKLTCLGG GRRVWSAPLP 

801 RCVAECGASV KGNEGTLLSP NFPSNYDNNH ECIYKIETEA GKGIHLRTRS 

851 FQLFEGDTLK VYDGKDSSSR PLGTFTKNEL LGLILNSTSN HLWLEFNTNG 

901 SDTDQGFQLT YTSFDLVKCE DPGIPNYGYR IRDEGHFTDT WLYSCNPGY 

951 AMHGSNTLTC LSGDRRVWDK PLPSCIAECG GQIHAATSGR ILSPGYPAPY 

1001 DNNLHCTWII EADPGKTISL HFIVFDTEMA HDILKVWDGP VDSDILLKEW 

1051 SGSALPEDIH STFNSLTLQF DSDFFISKSG FSIQFSTSIA ATCNDPGMPQ 

1101 NGTRYGDSRE AGDTVTFQCD PGYQLQGQAK ITCVQLNNRF FWQPDPPTCI 

1151 AACGGNLTGP AGVILSPNYP QPYPPGKECD WRVKVNPDFV IALIFKSFNM 

1201 EPSYDFLHIY EGEDSNSPLI GSYQGSQAPE RIESSGNSLF LAFRSDASVG 

1251 LSGFAIEFKE KPREACFDPG NIMNGTRVGT DFKLGSTITY QCDSGYKILD 

1301 PSSITCVIGA DGKPSWDQVL PSCNAPCGGQ YTGSEGWLS PNYPHNYTAG 

1351 QICLYSITVP KEFVVFGQFA YFQTALNDLA ELFDGTHAQA RLLSSLSGSH 

1401 • SGETLPLATS NQILLRFSAK SGASARGFHF VYQAVPRTSD TQCSSVPEPR 

1451 YGRRIGSEFS AGSIVRFECN PGYLLQGSTA LHCQSVPNAL AQWNDTIPSC 

1501 WPCSGNFTQ RRGTILSPGY PEPYGNNLNC IWKIIVTEGS GIQIQVISFA 

1551 TEQNWDSLEI HDGGDVTAPR LGSFSGTTVP ALLNSTSNQL YLHFQSDISV 

1601 AAAGFHLEYK TVGLAACQEP ALPSNSIKIG DRYMVNDVLS FQCEPGYTLQ 

1651 GRSHISCMPG TVRRWNYPSP LCIATCGGTL STLGGVILSP GFPGSYPNNL 

17 01 DCTWRISLPI GYGAHIQFLN FSTEANHDFL EIQNGPYHTS PMIGQFSGTD 

1751 LPAALLSTTH ETLIHFYSDH SQNRQGFKLA YQA* 
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SEQ ID NO:13 



3V2 Protein sequence 

1 VGCAAGLGTG XSLRIALPSG 

. 51 RRFQSLKLLL GLLVLCARLL 

101 YANCTWIIIT GERNRIQLSF 

151 FQLPSSIVST - GSILTLWFTT 

201 VLHGTRFNIG DXIRYSCLPG 

251 EGACGGTLRG TSSSISSPHF 

301 LEEGYDFLEI SGTEAPSIWL 

351 FNAQFQVKKA. IELKSRGVKM 

4 01 PENGRRAGSD FRVGANVQFS 

4 51 ICRARTCGSN LRGPSGVITS 

501 EEFELERGYD TLTVGDAGKV 

551 QSDDSIGSPG FKAVYQEIEK 

601 AAFELVGERV ITCQQNNQWS 

651 GNNMNCVWLI ISEPGSRIHL 

701 SGNEVPSQLA SSGHIVRLEF 

751 NGRRFGDRFL. LGSSVSFHCD 

801 EAPCGGHLTA SSGVILPPGW 

851 TEVNYDTLEV RDGPASSSPL 

901 SSIGFLIHYE SVTLESDSCL 

951 L S DDE PL VCE RNHQWNHALP 

1001 NCTWTIEVSH GKGVQMIFHT 

1051 PHTIKAGXjFG NFTAQLRFIS 

1101 RRIGFHFGVG DSLTFSCFLG 

1151 GASVKGNEGT LLSPNFPSNY 

1201 DTLKVYDGKD SSSRPLGTFT 

1251 FQLTYTSFDL VKCEDPGIPN 

1301 TLTCLSGDRR VWDKPLPSCI 

1351 TWIIEADPGK TISLHFIVFD 

1401 EDIHSTFNSL TLQFDSDFFI 

14 51 DSREAGDTVT FQCDPGYQLQ 

1501 LTGPAGVILS PNYPQPYPPG 

1551 LHIYEGEDSN SPLIGSYQGS 

1601 EFKEKPREAC FDPGNIMNGT 

1651 VIGADGKPSW DQVLPSCNAP 

1701 ITVPKEFWF GQFAYFQTAL 

1751 LATSNQILLR FSAKSGASAR 

1801 SEFSAGSIVR FECNPGYLLQ 

1851 NFTQRRGTIL SPGYPEPYGN 

1901 SLEIHDGGDV TAPRLGSFSG 

1951 LEYKTVGLAA CQEPALPSNS 

2001 CMPGTVRRWN YPSPLCIATC 

2051 SLPIGYGAHI QFLNFSTEAN 

2101 STTHETLIHF YSDHSQNRQG 

2151 VGQSVSFECY PGYILIGHPV 

2201 GTIYSPGFPD EYPILKDCIW 

2251 GPDQNSPQLG VFSGNTALET 

2301 IFTPLVKTEN SMWCLLQCCP 

2351 FV* 



2353 AA 

DYLAPLIALP SSEEPSVSRA CGVSADMTAW 
TAAKGQNCGG LVQGPNGTIE SPGFPHGYPN 
HTFALEEDFD ILSVYDGQPQ QGNLKVRLSG 
DFAVSAQGFK ALYEVLPSHT CGNPGEILKG 
YILEGHAILT CIVSPGNGAS WDFPAPFCRA 
PSEYENNADC TWTILAEPGD TIALVFTDFQ 
TGMNLPSPVI SS KNWLRIjHF TSDSNHRRKG 
LPSKDGSHKN SVLSQGGVAL VSDMCPDPGI 
CEDNYVLQGS KS I TCQRVTE TLAAWSDHRP 
PNYPVQYEDN AHCVWVITTT DPDKVIKLAF 
GDTRSVLYVL TGSSVPDLIV SMSNQMWLHL 
GGCGDPGIPA YGKRTGSSFL HGDXLTFECP 
GNKPSCVFSC FFNFTASSGI ILSPNYPEEY 
IFNDFDVEPQ FDFLAVKDDG ISDITVLGTF 
QSDHSTTGRG XNITYTTFGQ NECHDPGIPI 
DGFVKTQGSE SITCILQDGN WWSSTVPRC 
PGYYKDSLHC EWIIEAKPGH SIKITFDRFQ 
IGEYHGTQAP QFLISTGNFM YLLFTTDNSR 
DPGIBVNGHR HGGDFGIRST VTFSCDPGYT 
SCDALCGGYI QGKSGTVLSP GFPDFYPNSL 
FHLESSHDYL LITEDGSFSE PVARLTGSVL 
DFSISYEGFN ITFSEYDLEP CDDPGVPAFS 
YRLEGATKLT CLGGGRRVWS APLPRCVAEC 
DNNHECIYKI ETEAGKGIHL RTRSFQLFEG 
KNELLGLILN STSNHLWLEF NTNGSDTDQG 
YGYRIRDEGH FT DTWLY S C NPGYAMHGSN 
AECGGQIHAA TSGRILSPGY PAPYDNNLHC 
TEMAHDILKV WDGPVDSDIL LKEWSGSALP 
SKSGFSIQFS TSIAATCNDP GMPQNGTRYG 
GQAKITCVQL NNRFFWQPDP PTCIAACGGN 
KECDWRVKVN PDFVIALIFK SFNMEPSYDF 
QAPERIESSG NSLFLAFRSD ASVGLSGFAI 
RVGTDFKLGS TITYQCDSGY KILDPSSITC 
CGGQYTGSEG WLSPNYPHN YTAGQICLYS 
NDLAELFDGT HAQARLLSSL SGSHSGETLP 
GFHFVYQAVP RTSDTQCSSV PEPRYGRRIG 
GSTALHCQSV PNALAQWNDT IPSCWPCSG 
NLNCIWKIIV TEGSGIQIQV I S FATEQNWD 
TTVPALLNST SNQLYLHFQS DISVAAAGFH 
IKIGDRYMVN DVLSFQCEPG YTLQGRSHIS 
GGTLSTLGGV ILSPGFPGSY PNNLDCTWRI 
HDFLEIQNGP YHTSPMIGQF SGTDLPAALL 
FKLAYQAYEL QNCPDPPPFQ NGYMINSDYS 
LTCQHGINRN WNYPFPRCDA PCGYNVTSGN 
LITVPPGHGV YINFTLLQTE AVNDYIAVWD 
AYSSTNQVLL KFHSDFSNGG FFVLNFHGQL 
TPCFQLKFLD SAEGVYDSFA LEASVSCGPF 
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SEQIDNO:14 



Description 

851 to 7771 of 5R23V2 (translated) 



PROTEIN SEQUENCE 5R23V2 

LOCUS ' 5R23V2.PRO 2307 AA PROT UPDATED 05/11/101 
DEFINITION - 
ACCESSION 
KEYWORDS 
SOURCE 

FEATURES From To /Span 

Peptide 1 2307 

ORIGIN "*? 

1 MTAWRRFQSL LLLLGLLVLC ARLLTAAKGQ -NCGGLVQGPN GTIESPGFPH GYPNYANCTW 

61 IIITGERNRI QLSFHTFAIiE EDFDILSVYD GQPQQGNLKV RLSGFQLPSS IVSTGSILTL 

121 WFTTDFAVSA QGFKALYEVL PSHTCGNPGE ILKGVXHGTR FNIGDXIRYS CLPGYI1EGH 

181 AILTCIVSPG NGASWDFPAP FCRAEGACGG TLRGTSSSIS SPHFPSEYEN NADCTWTILA 

241 EPGDTIALVF TDFQLEEGYD FLEISGTEAP SIWLTGMNLP SPVISSKNWL RLHFTSDSNH 

301 RRKGFNAQFQ VKKAIELKSR GVKMLPSKDG SHKNSVLSQG GVALVSDMCP DPGIPENGRR 

361 AGSDFRVGAN VQFSCEDNYV LQGSKSITCQ RVTETLAAWS DHRPICRART CGSNLRGPSG 

421 VITSPNYPVQ YEDNAHCVWV ITTTDPDKVI KLAXEEFELE RGYDTLTVGD AGKVGDTRSV 

481 LXVLTGSSVP DLIVSMSNQM WLHLQSDDSI GSPGFKAVYQ EIEKGGCGDP GIPAYGKRTG 

541 SSFLHGDXLT FECPAAFELV GERVITCQQN NQWSGNKPSC VFSCFFNFTA SSGIILSPNY 

601 PEEYGNNMNC VWLIISEPGS RIHLIFNDFD VEPQFDFLAV KDDGISDITV LGTFSGNEVP 

661 SQLASSGHIV RLEFQSDHST TGRGXNITYT TFGQNECHDP GIPINGRRFG DRFLLGSSVS 

721 FHCDDGFVKT QGSESITCII* QDGNWWSST VPRCEAPCGG HLTASSGVIL PPGWPGYYKD 

781 SLHCEWIIEA KPGHSIKITF DRFQTEVNYD TLEVRDGPAS SSPLIGEYHG TQAPQFLIST 

841 GNFMYLLFTT DNSRSSIGFL IHYESVTLES DSCLDPGIPV NGHRHGGDFG IRSTVTFSCD 

901 PGYTLSDDEP LVCERNHQWN HALPSCDALC GGYIQGKSGT VLSPGFPDFY PNSLNCTWTI 

961 EVSHGKGVQM IFHTFHLESS HDYLLITEDG SFSEPVARLT GSVLPHTIKA GLXGNFTAQL 

1021 RFISDFSISY EGFNITFSEY DLEPCDDPGV PAFSRRIGFH FGVGDSLTFS CFLGYRLEGA 

1081 TKLTCLGGGR RVWSAPLPRC VAECGASVKG NEGTLLSPNF PSNYDNNHEC IYKIETEAGK 

1141 GIHLRTRSFQ LFEGDTLKVY DGKDSSSRPL GTFTKNELLG LILNSTSNHL WLEFNTNGSD 

1201 TDQGFQLTYT SFDLVKCEDP GIPNYGYRIR DEGHFTDTW LYSCNPGYAM HGSNTLTCLS 

1261 GDRRVWDKPL PSCIAECGGQ IHAATSGRIL SPGYPAPYDN NLHCTWIIEA DPGKTISLHF 

1321 I V FDTEMAHD ILKVWDGPVD SDILLKEWSG SALPEDIHST FNSLTLQFDS DFFISKSGFS 

1381 IQFSTSIAAT CNDPGMPQNG TRYGDSREAG DTVTFQCDPG YQLQGQAKIT CVQLNNRFFW 

1441 QPDPPTCIAA CGGNLTGPAG VILSPNYPQP YPPGKECDWR VKVNPDFVIA LIFKSFNMEP 

1501 SYDFLHIYEG EDSNSPLIGS YQGSQAPERI ESSGNSLFLA FRSDASVGIiS GFAIEFKEKP 

1561 REACFDPGNI MNGTRVGTDF KLGSTITYQC DSGYKILDPS SITCVIGADG KPSWDQVLPS 

1621 CNAPCGGQYT GSEGVVLSPN YPHNYTAGQI CLYSITVPKE FVVFGQFAYF QTALNDLAEL 

1681 FDGTHAQARL LSSLSGSHSG ETLPLATSNQ ILLRFSAKSG ASARGFHFVY QAVPRTSDTQ 

1741 CSSVPEPRYG RRIGSEFSAG SIVRFECNPG YLLQGSTALH- CQSVPNALAQ WNDTIPSCW 

1801 PCSGNFTQRR GTILSPGYPE PYGNNLNCIW KIIVTEGSGI QIQVISFATE QNWDSLEIHD 

18 61 GGDVTAPKLG SFSGTTVPAb LNSTSNQLYL HFQSDISVAA AGFHLEYKTV GLAACQEPAL 

1921 PSNSIKIGDR YMVNDVLSFQ CEPGYTLQGR SHISCMPGTV RRWNYPSPLC IATCGGTLST 

1981 LGGVILSPGF PGSYPNNLDC TWRISLPIGY GAHIQFLNFS TEANHDFLEI QNGPYHTSPM 

2041 IGQFSGTDLP AALLSTTHET LIHFYSDHSQ NRQGFKLAYQ AYELQNCPDP PPFQNGYMIN 

2101 SDYSVGQSVS FECYPGYILI GHPVLTCQHG INRNWNYPFP RCDAPCGYNV TSQNGTIYSP 

2161 GFPDEYPILK DCIWLITVPP GHGVYINFTL LQTEAVNDYI AVWDGPDQNS PQLGVFSGNT 

2221 ALETAYSSTN QVLLKFHSDF SNGGFFVLNF HGQLIFTPLV KTENSMWCLL QCCPTPCFQL 

2281 KFLDSAEGVY DSFALEASVS CGPFFV* 
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SEQIDNO:15 



; 5R2J3C147 PROTEIN 

LOCUS TRANS LATI O 347 AA PROT DP DATED 05/11/101 

DEFINITION - 

ACCESSION 

KEYWORDS 

SOURCE 

FEATURES From To/Span Description 

' Peptide 1 347 851 to 1891 of 5r2_ocl47 (translated) 

ORIGIN ? 

1 MTAWRRFQSL LLLLGLLVLC ARLLTAAKGQ NCGGliVQGPN GTIESPGFPH GYPNYANCTW 
61 IIITGERNRI QLSFHTFALE EDFDILSVYD GQPQQGNLKV RLSGFQLPSS IVSTGSILTL 
121 WFTTDFAVSA QGFKALYEVL PSHTCGNPGE ILKGVLHGTR FNIGDKIRYS CLPGYILEGH 
IS 1 AILTCIVSPG *7GAS»?DFP?»P FCPJIEGACGG TLRGTSSSJS SPHFPSEYE23 NADCTWTI IxA 
241 EPGDTIALVF TDFQLEEGYD FLEISGTEAP SIWLTGMNLP SPVISSKNWIi RLHFTSDSNH 
301 RRKGFNAQFQ VKKAIELKSR GVKMLPSKDG SHKNSVCESL SFLSED* 
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SEQIDNO:16 



^ ; 5R2_AW PROTEIN 

LOCUS 5R2_AW_PRO 372 AA PROT UPDATED 05/11/101 

5 DEFINITION - 
ACCESSION 
KEYWORDS 
SOURCE 

FEATURES From To/Span Description 

10 Peptide 1 372 851 to 1966 of 5r2_aw (translated) 

ORIGIN ? 

1 MTAWRRFQSL LLLLG1LVLC ARLLTAAKGQ NCGGLVQGPN GTIESPGFPH GYPNYANCTW 
61 IIITGERNRI QLSFHTFALE EDFDILSVYD GQPQQGNLKV RLSGFQLPSS IVSTGSILTL 
121 WFTTDFAVSA, QGFKALYEVL PSHTCGNPGE I LKGVLHGTR FNIGDKIRYS CLPGYILEGH 
15 a 81 AILTCIVSPG NGASWDFPAP FCRAEGACGG TLRGTSSSIS SPHFPSEYEN NADCTWTIIxA 

241 EPGDTIALVF TDFQLEEGYD FLEISGTEAP SIWLTGMNLP SPVISSKNWL RLHFTSDSNH 
301 RRKGFNAQFQ VKKAIELKSR GVKMLPSKDG SHKNSVWHQQ EFSKCRKKKR EIMTRNGRIS 
361 LTASGNIiQFD N* 

// 

20 
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